0% found this document useful (0 votes)

11 views

Python 06 MachineLearning

The document provides an introduction to machine learning algorithms categorized by their output types, including classification, regression, clustering, and more. It discusses the importance of data features and dimensions, feature extraction, and various machine learning models such as decision trees, neural networks, and ensemble methods. Additionally, it highlights the Scikit-Learn library, its features, and the API for building and evaluating machine learning models.

Uploaded by

ashanisharma9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Python 06 MachineLearning

Uploaded by

ashanisharma9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Introduction to Machine

Learning with Scikit-Learn

Types of Algorithms by Output
Input training data to fit a model which is then
used to predict incoming inputs into ...

Type of Output Algorithm Category

Output is one or more discrete classes Classification (supervised)

Output is continuous Regression (supervised)

Output is membership in a similar group Clustering (unsupervised)

Output is the distribution of inputs Density Estimation

Output is simplified from higher dimensions Dimensionality Reduction

Classification

Given labeled input data (with two or more labels), fit a

function that can determine for any input, what the label is.
Regression

Given continuous input data fit a function that is able to

predict the continuous value of input given other data.
Clustering

Given data, determine a pattern of associated data points

or clusters via their similarity or distance from one another.
Hadley Wickham (2015)
“Model” is an overloaded term.
•Model family describes, at the broadest possible level, the
connection between the variables of interest.
•Model form specifies exactly how the variables of interest
are connected within the framework of the model family.
•A fitted model is a concrete instance of the
model form where all parameters have been
estimated from data, and the model can be
used to generate predictions.

http://had.co.nz/stat645/model-vis.pdf
Dimensions and Features
In order to do machine learning you need a data set containing
instances (examples) that are composed of features from which
you compose dimensions.

Instance: a single data point or example composed of fields

Feature: a quantity describing an instance
Dimension: one or more attributes that describe a property

from sklearn.datasets import l o a d _ d i g i t s

digit s = load_digits()

X= d i g i t s . d a t a # X.shape == (n_samples, n_features)

y = digits.target # y.shape == (n_samples,)
Feature Space
Feature space refers to the n-dimensions where your variables live (not
including a target variable or class). The term is used often in ML literature
because in ML all variables are features (usually) and feature extraction is the
art of creating a space with decision boundaries.

Target
1. Y ≡ Thickness of car tires after some testing period

Variables
1. X1 ≡ distance travelled in test
2. X2 ≡ time duration of test
3. X3 ≡ amount of chemical C in tires

The feature space is R3, or more accurately, the positive quadrant in R3 as all
the X variables can only be positive quantities.

http://stats.stackexchange.com/questions/46425/what-is-feature-space
Mappings
Domain knowledge about tires might suggest that the speed the vehicle was
moving at is important, hence we generate another variable, X4 (this is the
feature extraction part):

X4 = X1*X2 ≡ the speed of the vehicle during testing.

This extends our old feature space into a new one, the positive part of R4.

A mapping is a function, ϕ, from R3 to R4:

ϕ(x1,x2,x3) = (x1,x2,x3,x1x2)

http://stats.stackexchange.com/questions/46425/what-is-feature-space
Your Task
Given a data set of instances of size N, create
a model that is fit from the data (built) by
extracting features and dimensions. Then use
that model to predict outcomes …
1. Data Wrangling (normalization, standardization, imputing)
2. Feature Analysis/Extraction
3. Model Selection/Building
4. Model Evaluation
5. Operationalize Model
A Tour of Machine Learning
Algorithms
Models: Instance Methods
Compare instances in data set with a similarity
measure to find best matches.
- Suffers from curse of dimensionality.
- Focus on feature representation and
similarity metrics between instances

● k-Nearest Neighbors (kNN)

● Self-Organizing Maps (SOM)
● Learning Vector Quantization (LVQ)
Self-Organizing Maps
Models: Regression
Model relationship of independent variables, X
to dependent variable Y by iteratively
optimizing error made in predictions.

● Ordinary Least Squares

● Logistic Regression
● Stepwise Regression
● Multivariate Adaptive Regression Splines (MARS)
● Locally Estimated Scatterplot Smoothing (LOESS)
Logistic Regression
Multivariate Adaptive Regression Splines (MARS)
Locally Estimated Scatterplot Smoothing (LOESS)
• Combine multiple regression models
in a k-nearest-neighbor-based meta-
model
• Fits a low-degree polynomial to a
subset of the data close to the current
point
• Requires fairly large, densely sampled
data sets in order to produce good
models.
Models: Regularization Methods
Extend another method (usually regression),
penalizing complexity (minimize overfit)
- simple, popular, powerful
- better at generalization

● Ridge Regression
● LASSO (Least Absolute Shrinkage & Selection Operator)
● Elastic Net
Models: Regularization Methods
LASSO
• Limits total weight of parameters
• Can be interpreted as a prior distribution on parameters

• Ridge regression: quadratic penalty

• Elastic Net combines both Laplace prior distributions
Models: Decision Trees
Model of decisions based on data attributes.
Predictions are made by following forks in a
tree structure until a decision is made. Used for
classification & regression.

● Classification and Regression Tree (CART)

● Decision Stump
● Random Forest
● Multivariate Adaptive Regression Splines (MARS)
● Gradient Boosting Machines (GBM)
Models: Decision Trees

http://www.saedsayad.com/decision_tree.htm
Models: Bayesian
Explicitly apply Bayes’ Theorem for
classification and regression tasks. Usually by
fitting a probability function constructed via the
chain rule and a naive simplification of Bayes.

● Naive Bayes
● Averaged One-Dependence Estimators (AODE)
● Bayesian Belief Network (BBN)
Naive Bayes

- Text retrieval 1960s

- Independence of feature values (given class)
- Bayesian theorem

Probability distribution:

- Voting for max. posterior probability (MAP)

Models: Kernel Methods
Map input data into higher dimensional vector
space where the problem is easier to model.
Named after the “kernel trick” which computes
the inner product of images of pairs of data.

● Support Vector Machines (SVM)

● Radial Basis Function (RBF)
● Linear Discriminant Analysis (LDA)
SVM
SVM
Models: Clustering Methods
Organize data into into groups whose members
share maximum similarity (defined usually by a
distance metric). Two main approaches:
centroids and hierarchical clustering.

● k-Means
● Affinity Propegation
● OPTICS (Ordering Points to Identify Cluster Structure)
● Agglomerative Clustering
K-means clustering
Models: Artificial Neural Networks
Inspired by biological neural networks, ANNs are
nonlinear function approximators that estimate
functions with a large number of inputs.
- System of interconnected neurons that activate
- Deep learning extends simple networks recursively

● Restricted Boltzmann Machine (RBM)

● Convolutional Neural Networks (CNN)
● Recurrent Neural Networks (RNN)
● Word2Vec models
Models: Artificial Neural Networks
Models: Ensembles
Models composed of multiple weak models that
are trained independently and whose outputs
are combined to make an overall prediction.

● Boosting
● Bootstrapped Aggregation (Bagging)
● AdaBoost
● Stacked Generalization (blending)
● Gradient Boosting Machines (GBM)
● Random Forest
AdaBoost
AdaBoost
Models: Other
The list before was not comprehensive, other
algorithm and model classes include:
● Conditional Random Fields (CRF)
● Markovian Models (HMMs)
● Dimensionality Reduction (PCA, PLS)
● Rule Learning (Apriori, Brill)
● More ...
What is Scikit-Learn?
Extensions to SciPy (Scientific Python) are
called SciKits. SciKit-Learn provides machine
learning algorithms.
● Algorithms for supervised & unsupervised learning
● Built on SciPy and Numpy
● Standard Python API interface
● Sits on top of c libraries, LAPACK, LibSVM, and Cython
● Open Source: BSD License (part of Linux)

Probably the best general ML framework out there.

Primary Features
- Generalized Linear Models
- SVMs, kNN, Bayes, Decision Trees, Ensembles
- Clustering and Density algorithms
- Cross Validation
- Grid Search
- Pipelining
- Model Evaluations
- Dataset Transformations
- Dataset Loading
A Guide to Scikit-Learn
Scikit-Learn API
Object-oriented interface centered around the
concept of an Estimator:
“An estimator is any object that learns from data; it may
be a classification, regression or clustering algorithm or
a transformer that extracts/filters useful features from
raw data.”

- Scikit-Learn Tutorial
class Estimator(object):

def f i t ( s e l f , X, y=None):
" " " F i t s estimator to d a t a . " " "
# set state o f ` ` s e l f ` `
returns e l f

def p r e d i c t ( s e l f , X ) :
" " " P r e d i c t response o f` ` X ` ` . " " "
# compute p r e d i c t i o n s ` ` p r e d ` `
return pred

The Scikit-Learn Estimator API

Estimators
- f i t ( X , y ) sets the state of the estimator.
- X is usually a 2D numpy array of shape
(num_samples, num_features).
- y is a 1D array with shape (n_samples,)
- p r e d i c t ( X ) returns the class or value
- predict_proba() returns a 2D array of
shape (n_samples, n_classes)
from sklearn import svm

estimato r = svm. SVC(gamma=0.001)

e s t i m a t o r. f i t ( X , y)
e s t i m a t o r. p r e d i c t ( x )

Basic methodology
Wrapping fit and predict
We’ve already discussed a broad workflow, the
following is a development workflow:

Feature Feature
Raw Data
Extraction Evaluation

Load &
Build Model Evaluate Model
Transform Data
Task 6
- Select dataset (wines / student performance)
- Apply various learning algorithms on the
problem
- Provide best possible prediction w.r.t. RMSE

Best bet to start with:

- Student performance: Random forrest /
boosting trees
- Wines: MARS / LASSO / Bagging linear
model
Semestrální projekt – varianta 2
• Drug-target interaction prediction
– Predikce interakcí mezi léčivými látkami a proteiny
– DTInet dataset
• https://github.com/luoyunan/DTINet
– Známé interakce (Binary)
– Strukturální podobnost drugs i targets [0,1]
– Namapování na nemoci a side effects
– Evaluace:
• Cílem je ranking prediction (seřazení objektů od nejlepšího po nejhorší)
• 10-fold cross validace (náhodně se skryje 10% interakcí, vaším úkolem je co možná
nejlépe je seřadit, zdroják bude součástí zadání)
• Area Under ROC Curve, Area Under Precision-Recall Curve
– Odevzdané řešení musí být lepší než základní baseline: BLM
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735674/ (zdrojáky budou k
dispozici)
– Typické algoritmy:
• faktorizace matic e interakcí (případně obohacená o externí data)
• Nearest neighbors a grafové algoritmy
• Lokální modely predikující hrany na základě vlastností interagující drug a target

Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (6)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
NLOS Detection Generated by Body Shadowing in A 6.5 GHZ UWB Localization System Using Machine Learning
No ratings yet
NLOS Detection Generated by Body Shadowing in A 6.5 GHZ UWB Localization System Using Machine Learning
12 pages
NBA Game Prediction Using Machine Learning Algorithm
0% (1)
NBA Game Prediction Using Machine Learning Algorithm
6 pages
algorithmeknn-121213175830-phpapp02
No ratings yet
algorithmeknn-121213175830-phpapp02
52 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Final ML
No ratings yet
Final ML
2 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
16 Comparison of Data Science Algorithms
No ratings yet
16 Comparison of Data Science Algorithms
13 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
100% (1)
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
60 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
AIML
No ratings yet
AIML
30 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) instant download
100% (2)
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) instant download
38 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
Data Science Crash Course
100% (1)
Data Science Crash Course
32 pages
ML Notes
No ratings yet
ML Notes
79 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
ML
No ratings yet
ML
49 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
ML ModuleUntitled 2
No ratings yet
ML ModuleUntitled 2
8 pages
Introduction to AI
No ratings yet
Introduction to AI
51 pages
AIML MODEL
No ratings yet
AIML MODEL
13 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
No ratings yet
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
2 pages
365 ML Infographic
No ratings yet
365 ML Infographic
1 page
Week 01
No ratings yet
Week 01
37 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
65 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
ppt4dl
No ratings yet
ppt4dl
81 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML-Notes
No ratings yet
ML-Notes
12 pages
Top 10 Machine Learning Algorithms With Their Use
100% (1)
Top 10 Machine Learning Algorithms With Their Use
12 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Lecture5
No ratings yet
Lecture5
26 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Machine
No ratings yet
Machine
61 pages
Bda unit 5
No ratings yet
Bda unit 5
11 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Intro To Machine Learning 101 Python Data Science v2
No ratings yet
Intro To Machine Learning 101 Python Data Science v2
101 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Report Print
No ratings yet
Report Print
22 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
1667575232
No ratings yet
1667575232
13 pages
Practical VI Sem (Revised)
No ratings yet
Practical VI Sem (Revised)
1 page
د_سارة_محمد_المشهداني_هبة_عمار_يحيى
No ratings yet
د_سارة_محمد_المشهداني_هبة_عمار_يحيى
39 pages
M10 Ch5 ContinProbNotes UnifExp With Inv 2019W
No ratings yet
M10 Ch5 ContinProbNotes UnifExp With Inv 2019W
8 pages
50
No ratings yet
50
30 pages
aniket result of 4 sem
No ratings yet
aniket result of 4 sem
1 page
1-5-18 M Tech CSE Batch 2018 PDF
No ratings yet
1-5-18 M Tech CSE Batch 2018 PDF
80 pages
Integration of Unsupervised and Supervised Machine Learning Algorithms For Credit Risk Assessment
No ratings yet
Integration of Unsupervised and Supervised Machine Learning Algorithms For Credit Risk Assessment
15 pages
Web-Page Classification Through Summarization
No ratings yet
Web-Page Classification Through Summarization
8 pages
Article Vivien Armel Eyangolo
No ratings yet
Article Vivien Armel Eyangolo
12 pages
1 s2.0 S0167404821003230 Main
No ratings yet
1 s2.0 S0167404821003230 Main
21 pages
Second Hand Car Price Prediction
No ratings yet
Second Hand Car Price Prediction
18 pages
Unit 5
No ratings yet
Unit 5
104 pages
10 11648 J Ajcst 20220503 11
No ratings yet
10 11648 J Ajcst 20220503 11
10 pages
Inventions 07 00015
No ratings yet
Inventions 07 00015
30 pages
Stock Market Prediction Using Machine Learning
No ratings yet
Stock Market Prediction Using Machine Learning
15 pages
Paper Application of Artificial Intelligence in Drilling and Completion
No ratings yet
Paper Application of Artificial Intelligence in Drilling and Completion
20 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
SAMUEL MUIGAI - FINAL THESIS 2024
No ratings yet
SAMUEL MUIGAI - FINAL THESIS 2024
158 pages
AI_Enabled_Threat_Detection_Leveraging_Artificial_Intelligence_for_Advanced_Security_and_Cyber_Threat_Mitigation
No ratings yet
AI_Enabled_Threat_Detection_Leveraging_Artificial_Intelligence_for_Advanced_Security_and_Cyber_Threat_Mitigation
10 pages
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
No ratings yet
Complete Download An Introduction to Statistical Learning: with Applications in Python Gareth James PDF All Chapters
55 pages
CV Ali Mehrizi Sani
No ratings yet
CV Ali Mehrizi Sani
21 pages
Credit Risk - Predictive Modelling
No ratings yet
Credit Risk - Predictive Modelling
47 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
4 pages
AI Final Assignment
No ratings yet
AI Final Assignment
27 pages
PlayeRank: Data-Driven Performance Evaluation and Player Ranking in Soccer Via A Machine Learning Approach
No ratings yet
PlayeRank: Data-Driven Performance Evaluation and Player Ranking in Soccer Via A Machine Learning Approach
18 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
AuthorVersion PublishedTuningHyperparameters
No ratings yet
AuthorVersion PublishedTuningHyperparameters
30 pages
ArtificialIntelligencein5GTechnologyASurvey Morocho - Cayamcela Lim
No ratings yet
ArtificialIntelligencein5GTechnologyASurvey Morocho - Cayamcela Lim
7 pages
IITD_CPADSAI_f5904ed02d
No ratings yet
IITD_CPADSAI_f5904ed02d
20 pages
(Ebook) Pervasive Computing and Social Networking: Proceedings of ICPCSN 2022 by G. Ranganathan, Robert Bestak, Xavier Fernando, (eds) ISBN 9789811928390, 9811928398 2024 scribd download
100% (5)
(Ebook) Pervasive Computing and Social Networking: Proceedings of ICPCSN 2022 by G. Ranganathan, Robert Bestak, Xavier Fernando, (eds) ISBN 9789811928390, 9811928398 2024 scribd download
71 pages
DoS_Attack_Detection_Enhancement_in_Autonomous_Vehicle_Systems_with_Explainable_AI
No ratings yet
DoS_Attack_Detection_Enhancement_in_Autonomous_Vehicle_Systems_with_Explainable_AI
6 pages
Intelligent Decision Support in Auditing: Big Data and Machine Learning Approach
No ratings yet
Intelligent Decision Support in Auditing: Big Data and Machine Learning Approach
7 pages

Python 06 MachineLearning

Uploaded by

Python 06 MachineLearning

Uploaded by

Introduction to Machine

Learning with Scikit-Learn

Type of Output Algorithm Category

Output is one or more discrete classes Classification (supervised)

Output is continuous Regression (supervised)

Output is membership in a similar group Clustering (unsupervised)

Output is the distribution of inputs Density Estimation

Output is simplified from higher dimensions Dimensionality Reduction

Given labeled input data (with two or more labels), fit a

Given continuous input data fit a function that is able to

Given data, determine a pattern of associated data points

Instance: a single data point or example composed of fields

from sklearn.datasets import l o a d _ d i g i t s

X= d i g i t s . d a t a # X.shape == (n_samples, n_features)

X4 = X1*X2 ≡ the speed of the vehicle during testing.

A mapping is a function, ϕ, from R3 to R4:

● k-Nearest Neighbors (kNN)

● Ordinary Least Squares

• Ridge regression: quadratic penalty

● Classification and Regression Tree (CART)

- Text retrieval 1960s

- Voting for max. posterior probability (MAP)

● Support Vector Machines (SVM)

● Restricted Boltzmann Machine (RBM)

Probably the best general ML framework out there.

The Scikit-Learn Estimator API

estimato r = svm. SVC(gamma=0.001)

Best bet to start with:

You might also like