Logo
  • About
  • Reads
  • Posts
  • Projects
  • Contact

Built on Super Notion

GitHubLinkedIn

Iris Eda

Published on
Tags
Featured

The Iris dataset is one of the oldest and widely used data sets for learning purposes. The data set can be retrieved from the UCI Machine Learning Repository.

🏆
Project Goal: To use the attributes of flowers to predict the species of the flower. Using the sepal length and width, and the petal length and width to predict if an Iris flower is of type Setosa, Versicolor, or Virginica.

Import data and packages

‣
Code

The Iris dataset being built into sci-kit learn allows one to simply load the dataset but that is not always the case, and I prefer the format of working with data frames as opposed to bunches so I opted to use both here to understand a bit about what will work best for me in the future.

Import data and packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load data method 1: import url
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris = pd.read_csv(csv_url, names = col_names)
# load data method 2: import from sklearn datasets
from sklearn import datasets

data = datasets.load_iris()
df = pd.DataFrame(data['data'], columns = data['feature_names'])
df['target'] = data['target']

The Iris dataset being built into sci-kit learn allows one to simply load the dataset but that is not always the case, and I prefer the format of working with data frames as opposed to bunches so I opted to use both here to understand a bit about what will work best for me in the future.