Lecture Notes 1 2 Intro Python

Uploaded by

Abhishek Gullipalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Lecture Notes 1 2 Intro Python

Uploaded by

Abhishek Gullipalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1.

INTRODUCTION
I. BASICS OF MACHINE LEARNING
Machine Learning is a science and art of programming computers to learn from data.
Examples:
• bank pre-approval for a loan: approved vs. not approved (supervised, classification)
• bank pre-approval for a loan amount (supervised, regression)
• spam filter (supervised, classification)
• document topic modeling (unsupervised)
• building an intelligent bot for a game (reinforcement learning)
ML is about getting data and using it not only for analysis, but to do a job such as predictions.
Why is ML so important/useful/popular these days and how is it different from traditional
approaches? In 90s scientists worked on image analysis and spam filters and they wrote codes
where they created rules for computers to do the task; nowadays the scientists write codes asking
computers to figure out why image is an face from the data:
• great amount of data available
• tremendous computational power
Skills:

Image taken from https://data-flair.training

Programming:
• Python with its main scientific libraries such as NumPy, Pandas, Matplotlib
• Scikit-Learn – contains implementation of many ML algorithms, created in 2007
• TensorFlow – a more complex library for distributed numerical computation; used
especially for training and running large neural networks; it was open sourced in 2015 and
the version 2.0 was released in 2019
• Keras – a high level Deep Learning API (Application Programming Interface) that makes
training and running neural networks very simple. It can run on the top of TensorFlow,
Theano, or MS Cognitive Toolkit. TensorFlow has it own implementation of keras called
tf.keras

II. STEPS IN MACHINE LEARNING / DATA ANALYTICS / DATA SCIENCE

1. Data ingestion (get the data)

2. Data preprocessing and cleaning
3. Exploratory data analysis and visualization
4. Pattern recognition and feature extraction
5. Modeling (select a model and train it)
6. Model evaluation
7. Inference

Data preprocessing and cleaning

o outliers (data coming from a robot),
o missing data (do you keep the data instance with a missing feature or do you delete it,
do you keep a feature with missing values or do you delete it, do you fill in missing
values and how?),
o malicious data (for example, someone trying to fabricate behavior data to promote their
item),
o erroneous data (maybe there was a software bug that wrote wrong data values),
o irrelevant data (maybe we are interested in data only from NYC),
o inconsistent data (for example, 5 or 5+4 zip codes)
o formatting issues (for example, 713-221-8631 or (713) 221-8631 or 7132218631)

Exploratory data analysis and visualization (discover and visualize the data to get insights)
o techniques depend on whether data is categorical or numerical: charts, graphs, tables,
numerical measures (average, standard deviation, min, max, range, quartiles, etc.)
o Pie chart showing the class level of students at some university

o Bar chart showing the number of male and female students at UHD enrolled each
year, from 2010 to 2021.
o Histogram showing the number of diamonds of a certain carat value
o Box-and-whiskers diagram showing the number of hours students spent last week
on HW
o Scatter plot showing diamond price vs. its carat value

o Word cloud plot summarizing text document

Pattern recognition and feature selection/extraction
Pattern recognition is a branch of ML that focuses on finding patterns and similarities in data.
Types of ML:
• Supervised or Predictive Learning – data consists of inputs and outputs; data is labeled
o Classification (outputs are categorical)
o Regression (outputs are real-valued)
• Unsupervised or Descriptive Learning – data consists of only inputs; data is not labeled
o Clustering
o Association Rule Mining
o Dimensionality reduction (Principal Component Analysis)
• Semi-supervised – partially labeled data
• Reinforcement Learning – an agent observes an environment, makes an action, and gets a
reward or a penalty; it must learn the best strategy (policy) to get the most reward over
time.
Classification
• Identifies to which class (category or group) an object belongs to
• Applications:
o image classification (handwritten digits classification)
o document/text classification (spam filter)
o object detection (face detection in an image)
• Algorithms: Logistic Regression, Support Vector Machines, Naïve Bayes Classifier,
Nearest Neighbors, Decision Trees, Random Forests, Neural Networks

Image taken from https://github.com/topics/spam-classifier

Regression
• Two goals: prediction and inference
o to predict the output associated with a given input
o to understand the relationship between the input and the output
• Applications: real estate prices, stock prices, drug response
• Algorithms: Linear Regression, Decision Trees, Random Forest, Nearest Neighbors,
Neural Networks

http://abyss.uoregon.edu/~js/glossary/correlation.html

Clustering
• It takes unlabeled data and returns a grouping of data
• We are not given any a priori class labels; instead, we want to find the “natural” groups,
called clusters, within the data
• Applications:
o grouping customers based on their purchasing behavior to send customized
targeted advertisements to each group
• Algorithms: K-means, Hierarchical Clustering
Association Rule Mining
• Market basket analysis: data consists of transactions; given that the customer purchased
burger and chips, predict what other items the customer is likely to buy

https://www.analyticsvidhya.com/blog/2014/08/effective-cross-selling-market-basket-analysis/

Dimensionality Reduction
• Principal Component Analysis: topic modeling (Latent Semantic Analysis in NLP)

https://www.datacamp.com/tutorial/discovering-hidden-topics-python
Feature selection/extraction includes methods that select relevant features and discard the
irrelevant features in the data
• For example, assume that our task is to select features for predicting mileage of a car and
we are given data that includes: engine capacity, top speed, and color
• Types of feature selection methods:
o true selection methods – choose a subset of all the features measured
o projection or embedding methods – compute linear or nonlinear combinations of
the features measured and then select a subset of these combinations

Modeling (select a model and train it)

The five basic aspects of modeling are:

1) specification: select the family or families from which to choose a model
2) selection: choose from within the set of models
3) fitting: fit the parameters of the model to the data
4) assessment: determine whether the model is appropriate for the data
5) inference: make the appropriate decisions using the results from the above steps

Example: artificial neural networks

https://www.tibco.com/reference-center/what-is-a-neural-network
Model evaluation

• To get unbiased assessment, we divide our dataset into three parts:

o Training set (60 to 70% of the total data)
It is used to train the model and learn the model parameters (fitting the model) such
as finding weights and biases in artificial neural networks.
o Validation set (15 to 20% of the total data)
It is used to tune the hyperparameters of the model (model type, model
architecture); for example, to choose the number of hidden layers in a neural
network. Once we choose the best model, we refit it typically on the entire (training
& validation) data.
o Testing set (15 to 20% of the total data)
This data set is used only to assess the performance of a fully trained model.
• If there is not enough data available, we can do k-fold cross validation. Given the value of
k, the data is split into k sets of roughly the same size. Each such set is treated as a validation
set, and all other observations become the training set. We run the model k times and
average test results. Typically, k is 5 or 10. When k equals the size of the training data set,
we have LOOCV (Leave One Out Cross Validation).

https://scikit-learn.org/stable/modules/cross_validation.html
III. MAIN CHALLENGES IN MACHINE LEARNING

• Insufficient quantity of data – it takes a lot of data for most ML models to work properly
o M. Banko, E. Brill, “Scaling to very very large corpora for natural language
disambiguation”, ACL '01: Proceedings of the 39th Annual Meeting on Association
for Computational Linguistics (July 2001), pages 26–33.
• Nonrepresentative training data
o The training data must be representative of the new data we want to generalize.
o Example: Literary Digest poll for the US presidential election in 1936; 2.4 million
completed surveys predicted that Landon would get 57% of the votes; Roosevelt
won with 62% of the votes.
• Poor quality data and irrelevant data - “garbage in, garbage out”
o outliers, missing values, etc.
o feature selection/extraction
• There is no universally best model
o D. H. Wolpert, W. G. Macready, "No Free Lunch Theorems for Optimization",
IEEE Transactions on Evolutionary Computation 1, 67 (1997).
• Overfitting and underfitting

https://www.kaggle.com/getting-started/166897
References and Reading Material:

[1] An Introduction to Statistical Learning, James, Witten, Hastie, Tibshirani (Chapter 2)

[2] Machine Learning – A Probabilistic Perspective, Murphy (Sections 1.1 – 1.3, 1.4.7-1.4.9)
[3] Hands-On Machine Learning with Scikit Learn, Keras & TensorFlow, Geron (Chapter 1)

2. PYTHON TUTORIAL
Look at Python tutorial codes (courtesy of Dr. Randy Davila).

Python Module 1 Question Bank Answers
No ratings yet
Python Module 1 Question Bank Answers
23 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Module_-1
No ratings yet
Module_-1
9 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Unit 3
No ratings yet
Unit 3
33 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
ML Final Print Upload
No ratings yet
ML Final Print Upload
10 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
Module 1
No ratings yet
Module 1
22 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
ML Notes
No ratings yet
ML Notes
79 pages
Class10-Introduction_to_ML
No ratings yet
Class10-Introduction_to_ML
32 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
ML Revision
No ratings yet
ML Revision
207 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Final ML
No ratings yet
Final ML
2 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Machine Learning - course
No ratings yet
Machine Learning - course
6 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
No ratings yet
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
102 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
MCA -ML Question Bank Answer
No ratings yet
MCA -ML Question Bank Answer
139 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
OmNarayanSingh CV
No ratings yet
OmNarayanSingh CV
1 page
hbt-bms-VLC-853-E-datasheet
No ratings yet
hbt-bms-VLC-853-E-datasheet
2 pages
NEO 2000 /4000 Library: User Guide
No ratings yet
NEO 2000 /4000 Library: User Guide
150 pages
ProtoPlasmTS Pro Manual
No ratings yet
ProtoPlasmTS Pro Manual
9 pages
Unit 1 Introduction and Classical Ciphers
No ratings yet
Unit 1 Introduction and Classical Ciphers
15 pages
Value Education Syllabus
0% (1)
Value Education Syllabus
3 pages
It430 Final Term File 4 Solved by Team Hadi
No ratings yet
It430 Final Term File 4 Solved by Team Hadi
23 pages
ECL310 Manual
No ratings yet
ECL310 Manual
162 pages
Quizizz: Sempoa: Quiz Started On: Thu 05, Nov 07:58 AM Total Attendance: 43 Average Score: 4820 Class Level # Correct
No ratings yet
Quizizz: Sempoa: Quiz Started On: Thu 05, Nov 07:58 AM Total Attendance: 43 Average Score: 4820 Class Level # Correct
28 pages
21.streams in Snowflake
No ratings yet
21.streams in Snowflake
8 pages
Falk, Tiago H. - Sejdic, Ervin - Signal Processing and Machine Learning For Biomedical Big Data-Taylor & Francis (2018)
No ratings yet
Falk, Tiago H. - Sejdic, Ervin - Signal Processing and Machine Learning For Biomedical Big Data-Taylor & Francis (2018)
624 pages
We Make Things Simple and Efficient: Plantdesk
No ratings yet
We Make Things Simple and Efficient: Plantdesk
3 pages
Tulus Asih Construction: Ho: Ta Div
No ratings yet
Tulus Asih Construction: Ho: Ta Div
2 pages
RPS-Riset-Operasi Statistika 32232 0
No ratings yet
RPS-Riset-Operasi Statistika 32232 0
7 pages
ARR- CSC ServiceNetwork
No ratings yet
ARR- CSC ServiceNetwork
8 pages
2024-11-02 - 14-43-43 Plugin Log
No ratings yet
2024-11-02 - 14-43-43 Plugin Log
2 pages
Oracle Data Mining
No ratings yet
Oracle Data Mining
6 pages
Create A PHP Program Using Cookies
No ratings yet
Create A PHP Program Using Cookies
3 pages
Priyanka QA Resume
No ratings yet
Priyanka QA Resume
4 pages
WEEK 2 ARTS dIGITAL PAINTING
No ratings yet
WEEK 2 ARTS dIGITAL PAINTING
5 pages
Pengadaan Alat Percepatan Pelayanan Surat Elektronik (Sp2se)
No ratings yet
Pengadaan Alat Percepatan Pelayanan Surat Elektronik (Sp2se)
2 pages
W1 Orientation - PPT
No ratings yet
W1 Orientation - PPT
13 pages
Industrial Identification With SIMATIC RF300 and RF180C
No ratings yet
Industrial Identification With SIMATIC RF300 and RF180C
35 pages
Consgnment Notes
No ratings yet
Consgnment Notes
23 pages
Algorithms: What Is Computational Thinking?
No ratings yet
Algorithms: What Is Computational Thinking?
13 pages
DNF Cheat Sheet: by Via
No ratings yet
DNF Cheat Sheet: by Via
1 page
Uniti Iv Hyperledger
No ratings yet
Uniti Iv Hyperledger
22 pages
Oup 118
No ratings yet
Oup 118
62 pages
Access Control List - ServiceNow Community
No ratings yet
Access Control List - ServiceNow Community
9 pages