0% found this document useful (0 votes)

81 views

Machine Learning in Python

The document discusses machine learning using the Iris dataset in Python. It explains that datasets need credible data sources and sufficient size. The Iris dataset is recommended for beginners as it has numeric attributes, is a small classification problem with 4 attributes and 150 rows, and requires no special preprocessing. The document shows how to load the Iris data, view its dimensions and statistics, and breakdown classes. Finally, it demonstrates univariate and multivariate data visualization techniques like box plots, histograms, and scatter plots to better understand the Iris dataset attributes and relationships.

Uploaded by

Katlo Kay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views

Machine Learning in Python

Uploaded by

Katlo Kay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Faculty of Computer Science

University of Sunderland

Machine Learning in Python

In order to start the machine learning process, you need to possess a set of data to be used for training
the algorithm. It's very important to ensure that the source of data is credible, otherwise you would
receive incorrect results, even if the algorithm itself is working correctly (following the garbage in,
garbage out principle).

The second important thing is the size of the dataset. There is no straightforward answer for how large
it should be. The answer may depend on many factors, for example:

• the type of problem you're looking to solve,

• the number of features in the data,
• the type of algorithm used.

The best small project to start with on a new tool is the classification of iris flowers (e.g. the iris
dataset). This is a good project because it is so well understood and it’s used in tons of examples.

• Attributes are numeric so you have to figure out how to load and handle data.
• It is a classification problem, allowing you to practice with perhaps an easier type of
supervised learning algorithm.
• It is a multi-class classification problem (multi-nominal) that may require some specialized
handling.
• It only has 4 attributes and 150 rows, meaning it is small and easily fits into memory (and a
screen or A4 page).
• All of the numeric attributes are in the same units and the same scale, not requiring any
special scaling or transforms to get started.

Loading libraries and importing the dataset, enter and run:

Now it is time to take a look at the data. We are going to take a look at the data a few different
ways:

1. Dimensions of the dataset.

2. Peek at the data itself.
3. Statistical summary of all attributes.
4. Breakdown of the data by the class variable.
Faculty of Computer Science
University of Sunderland

We can get a quick look at the dimensions of the data with the shape property:

We can see the whole data set with the following code:

Note: the 20 in brackets will show us the first 20 rows of the dataset, you can edit this to see more or
less data.

We can see a statistical summary of our dataset by using the describe function. This will give us a
summary of each attribute including the count, mean, the min and max values as well as some
percentiles.

Let’s now take a look at the number of instances (rows) that belong to each class. We can view this as
an absolute count, it is known as the class distribution.

You should now have a basic idea about the data. We need to extend that with some visualizations
Faculty of Computer Science
University of Sunderland

Data Visualisation

Viewing your data in python relies on matplotlib libraries, before we look at the Iris dataset, here are
some code examples that will help you understand how data is presented.

Enter and run the code below:

You should get the following output:

Next we can add some markers:

You should get the following output:

Faculty of Computer Science
University of Sunderland

Now let’s go back to our Iris dataset, we ae going to look at different types of plots.

1. Univariate plots to better understand each attribute.

2. Multivariate plots to better understand the relationships between attributes.

We start with some univariate plots, that is, plots of each individual variable. Given that the input
variables are numeric, we can create box and whisker plots of each. Enter and run the following code
after you have imported the dataset:

This should give you a much clearer idea of the distribution of the input attributes.

We can also create a histogram of each input variable to get an idea of the distribution:

It looks like perhaps two of the input variables have a Gaussian distribution. This is useful to note as
we can use algorithms that can exploit this assumption.

Multivariate plots allow us to see the interactions between the variables in our dataset. If we look at
scatterplots of pairs of attributes, it can help us spot structured relationships between those variables.
Faculty of Computer Science
University of Sunderland

Given that we are going to be using matplotlib to display 3d data and graphs, we need to switch to a
different coding environment. Close everything down and relaunch Anaconda, but this time select
Spyder instead of Jupyter.

Type in and run the following code:

Fidelangeli Galli
No ratings yet
Fidelangeli Galli
46 pages
CS 2 3 4 Aml
No ratings yet
CS 2 3 4 Aml
70 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Introduction To Data Visualization in Python - by Gilbert Tanner - Towards Data Science
No ratings yet
Introduction To Data Visualization in Python - by Gilbert Tanner - Towards Data Science
22 pages
Lecture 3 Part 1 Understanding Data With Statistics
No ratings yet
Lecture 3 Part 1 Understanding Data With Statistics
7 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
Topic 2. Visual Data Analysis in Python: Mlcourse - Ai (Https://mlcourse - Ai)
No ratings yet
Topic 2. Visual Data Analysis in Python: Mlcourse - Ai (Https://mlcourse - Ai)
25 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
ML Lab File
No ratings yet
ML Lab File
43 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Essential Algorithms: A Practical Approach to Computer Algorithms
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms
Rod Stephens
4.5/5 (2)
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Task 1
No ratings yet
Task 1
14 pages
Ludic - Workshop - Iris - Copie
No ratings yet
Ludic - Workshop - Iris - Copie
5 pages
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
AUT
No ratings yet
AUT
19 pages
Lab 6
No ratings yet
Lab 6
4 pages
Python (Visualization)
No ratings yet
Python (Visualization)
3 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
PR final file
No ratings yet
PR final file
49 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
Mathematica Data Analysis
From Everand
Mathematica Data Analysis
Suchok Sergiy
No ratings yet
Introduction To Orange: Data Analytics Core
50% (2)
Introduction To Orange: Data Analytics Core
33 pages
ADS_EXP_3 (1)
No ratings yet
ADS_EXP_3 (1)
7 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
EDA_UNIT_1
No ratings yet
EDA_UNIT_1
7 pages
Data Visualization Python Tutorial
No ratings yet
Data Visualization Python Tutorial
9 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
iris-dataset-project-report_compress
No ratings yet
iris-dataset-project-report_compress
16 pages
AIML Expt
No ratings yet
AIML Expt
7 pages
2 Machine Learning
No ratings yet
2 Machine Learning
21 pages
Exp-1
No ratings yet
Exp-1
22 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Summary Data
No ratings yet
Summary Data
2 pages
Project Report
No ratings yet
Project Report
37 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
72b85f60-8523-423f-9efc-ff56aa21f3f3
No ratings yet
72b85f60-8523-423f-9efc-ff56aa21f3f3
29 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
PRACTICAL5
No ratings yet
PRACTICAL5
23 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Java: Advanced Guide to Programming Code with Java
From Everand
Java: Advanced Guide to Programming Code with Java
Charlie Masterson
No ratings yet
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
From Everand
Java: Advanced Guide to Programming Code with Java: Java Computer Programming, #4
Charlie Masterson
No ratings yet
EDA AnalysisA
No ratings yet
EDA AnalysisA
15 pages
Data Visualization
No ratings yet
Data Visualization
29 pages
3-Random Projection and Compressed Sensing technique-13-01-2025
No ratings yet
3-Random Projection and Compressed Sensing technique-13-01-2025
84 pages
04_MLModelingBasics
No ratings yet
04_MLModelingBasics
61 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
DS Assignment
No ratings yet
DS Assignment
12 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
(2305.05247) Leveraging Generative AI Models For Synthetic Data Generation in Healthcare: Balancing Research and Privacy
No ratings yet
(2305.05247) Leveraging Generative AI Models For Synthetic Data Generation in Healthcare: Balancing Research and Privacy
16 pages
Milestone-FMT
No ratings yet
Milestone-FMT
2 pages
XAI Benchmark for Visual Explanation
No ratings yet
XAI Benchmark for Visual Explanation
16 pages
An Anomaly Detection Model Based On One-Class
No ratings yet
An Anomaly Detection Model Based On One-Class
6 pages
BigML WhizzML Tutorials
No ratings yet
BigML WhizzML Tutorials
45 pages
1 Introduction to Automatic Process Control and Robotics in Food Industry 05-Jan-2024 23.01.2025
No ratings yet
1 Introduction to Automatic Process Control and Robotics in Food Industry 05-Jan-2024 23.01.2025
101 pages
Project Group
No ratings yet
Project Group
20 pages
A Machine Learning Based Framework For A Stage-Wise Classification of Date Palm White Scale Disease
No ratings yet
A Machine Learning Based Framework For A Stage-Wise Classification of Date Palm White Scale Disease
10 pages
ABAP Unit1 Notes
No ratings yet
ABAP Unit1 Notes
14 pages
FOAI-UNIT-1-2-3
No ratings yet
FOAI-UNIT-1-2-3
41 pages
Image Classification With Machine Learning As A Service
No ratings yet
Image Classification With Machine Learning As A Service
39 pages
Deep Learning Curve 1693642530
No ratings yet
Deep Learning Curve 1693642530
10 pages
Ethical Hacking
No ratings yet
Ethical Hacking
22 pages
A Hybrid Machine-Learning Ensemble For Anomaly Detection in Real-Time Industry 4.0 Systems
No ratings yet
A Hybrid Machine-Learning Ensemble For Anomaly Detection in Real-Time Industry 4.0 Systems
13 pages
Sentiment Analysis On Twitter Using Neural Network
No ratings yet
Sentiment Analysis On Twitter Using Neural Network
7 pages
algorithms-17-00434-v2
No ratings yet
algorithms-17-00434-v2
35 pages
Movie Box Office Success Prediction Using Machine Learning
No ratings yet
Movie Box Office Success Prediction Using Machine Learning
4 pages
Final_Report_Pneumonia_Detection_CV1_Group_1
No ratings yet
Final_Report_Pneumonia_Detection_CV1_Group_1
100 pages
Proceedings Of 3rd International Conference On Artificial Intelligence Advances And Applications Icaiaa 2022 Garima Mathur instant download
100% (1)
Proceedings Of 3rd International Conference On Artificial Intelligence Advances And Applications Icaiaa 2022 Garima Mathur instant download
83 pages
Lab 10
No ratings yet
Lab 10
9 pages
A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning
No ratings yet
A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning
6 pages
Synaptic Signatures: Advancing Gesture Recognition
No ratings yet
Synaptic Signatures: Advancing Gesture Recognition
11 pages
Case Study 2025
No ratings yet
Case Study 2025
34 pages
Brain Tumour Detection Using M-IRO-Journals-3 4 5
No ratings yet
Brain Tumour Detection Using M-IRO-Journals-3 4 5
12 pages
Agriculture Crop Recommendation System Using
No ratings yet
Agriculture Crop Recommendation System Using
57 pages
ResearchPaper
No ratings yet
ResearchPaper
14 pages
Sahono 2020
No ratings yet
Sahono 2020
6 pages
Supervised Contrastive Learning
No ratings yet
Supervised Contrastive Learning
23 pages
AI Final1
No ratings yet
AI Final1
18 pages

Machine Learning in Python

Uploaded by

Machine Learning in Python

Uploaded by

Faculty of Computer Science

Machine Learning in Python

• the type of problem you're looking to solve,

Loading libraries and importing the dataset, enter and run:

1. Dimensions of the dataset.

Enter and run the code below:

You should get the following output:

Next we can add some markers:

You should get the following output:

1. Univariate plots to better understand each attribute.

Type in and run the following code:

You might also like