The Iris dataset is one of the oldest and widely used data sets for learning purposes. The data set can be retrieved from the UCI Machine Learning Repository.
Import data and packages
The Iris dataset being built into sci-kit learn allows one to simply load the dataset but that is not always the case, and I prefer the format of working with data frames as opposed to bunches so I opted to use both here to understand a bit about what will work best for me in the future.
Import data and packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load data method 1: import url
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris = pd.read_csv(csv_url, names = col_names)
# load data method 2: import from sklearn datasets
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data['data'], columns = data['feature_names'])
df['target'] = data['target']
The Iris dataset being built into sci-kit learn allows one to simply load the dataset but that is not always the case, and I prefer the format of working with data frames as opposed to bunches so I opted to use both here to understand a bit about what will work best for me in the future.