0% found this document useful (0 votes)

111 views

CS464 Ch1 Intro Fall2020

The document describes CS464, an undergraduate machine learning course at Bilkent University that aims to provide students with a broad overview of machine learning concepts and algorithms and skills to apply them to real-world problems, covering topics like classification, clustering, neural networks, and term projects involving applications like facial expression recognition and question categorization. Prerequisites for the course include basic algorithms, probability, statistics, and linear algebra, and students are encouraged to form project teams and discuss ideas.

Uploaded by

Mathias Bueno

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views

CS464 Ch1 Intro Fall2020

Uploaded by

Mathias Bueno

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

CS464

Introduction to
Machine Learning
CS464
Bilkent University
CS464
•  Undergraduate-level introductory course aims at
giving a broad overview of many concepts and
algorithms in machine learning.

•  The goal is to provide students with a deep

understanding of the subject matter and skills to
apply these concepts to real world problems.
Prerequisites
•  Basic algorithms, data structures

•  Basic probability and statistic

•  Basic linear algebra

•  Good programming skills

Term Projects
•  “Photobomb Detector”

•  “Facial Expression Recognition”

•  “Predicting the Stack Exchange Questions’ Category”

•  Some ideas from Stanford ML course:

–  http://cs229.stanford.edu/proj2019aut/index.html
TO DO
•  Enroll on the Moodle:
•  Read the syllabus:
–  https://stars.bilkent.edu.tr/syllabus/view/CS/464/
•  Read information:
–  http://ciceklab.cs.bilkent.edu.tr/ercumentcicek/cs-464-
introduction-to-machine-learning-fall-2020/
•  Start talking to your classmates about project ideas
and start establishing teams:
–  Groups of 5 people from the same section.
Data-rich Era
•  The information volume grows exponentially
–  Most data will never be seen by humans

•  Information complexity is also increasing greatly

–  Most data cannot be comprehended by humans
directly

•  In the computationally enabled, data-rich era

machine learning is an essential component of the
new scientific toolkit
Genome Cost
Genome Data Size
Data-rich Era
•  The information volume grows exponentially
–  Most data will never be seen by humans

•  Information complexity is also increasing greatly

–  Most data cannot be comprehended by humans
directly

•  In the computationally enabled, data-rich era

machine learning is an essential component of the
new scientific toolkit
Learning from Data
•  Many applications require gaining insights from massive,
noisy data sets
•  Commercial applications
–  Speech recognition
•  https://youtu.be/2U1ImKsecB8
–  Consumer data (online advertising, viral marketing,…)
–  Health records (emergency rehospitalization,…)
•  Security related applications
–  Spam filtering, fraud detections…
–  Surveillance …
•  Science
–  Biology, astronomy, neuroscience, geology, …
–  Social science, economics
Spam or Ham

Ham

Spam
Text Classification
Incoming e-mail
Reading Zip Codes
•  Classifying hand written digits from binary images
Recommendation Systems
Recommendation Systems
Recommendation Systems
Statistical Machine Translation
Recognizing Faces

Figure taken from: https://github.com/deepinsight/insightface

Semantic Segmentation

Figure taken from: https://github.com/facebookresearch/detectron2

Image/Video Captioning

Automatically captioned: “Two pizzas sitting on top of a stove top oven”

Autopilot

Figure taken from: https://www.tesla.com/autopilot

Clustering Gene Expressions

Verhaak et al., Cancer Cell, 2010.

Analyzing fMRI Data

Mitchell et.al,
Science, 2009.

•  Predict activation patterns for nouns

Speech Recognition

https://youtu.be/D5VN56jQMWM
IBM Watson

•  IBM Watson won Jeopardy on February 2011

AlphaGo

•  AlphaGO beats team of five leading Go players

Health
Many Other Applications
•  Information Retrieval
•  Online advertising
•  Finance
•  Fraud detection
•  Economics
•  Social Sciences
Growth of Machine Learning
•  Machine learning is preferred approach to
–  Speech recognition, natural language processing
–  Computer vision
–  Medical outcomes analysis
–  Robot control
–  Computational biology
–  Sensor networks
•  This trend is accelerating
–  Big data
–  Improved machine learning algorithms
–  Faster computers
–  Good open-source software
Why “Learn” ?
•  Learning is useful when:
–  Human expertise does not exist (navigating on Mars),
–  Humans are unable to explain their expertise (speech
recognition, object recognition)
–  Solution needs to be adapted to particular cases
(personalization)
Experience = data
•  Humans learn to improve their skills through practice

•  The machine learning goal is to write algorithms that will

improve with experience
–  Experience is encoded as examples, or data
–  Autonomous learning must gain from experience

•  Where does training data come from?

–  Human annotation
–  Automatic collection
Objectives of Machine Learning
•  Algorithms:
–  design of efficient, accurate, and general learning algorithms to
deal with large-scale problems
–  make accurate predictions (unseen examples)
–  handle a variety of different learning problems

•  Theoretical questions:
–  What can be learned? Under what conditions?
–  What learning guarantees can be given?
–  What can we say about the inherent ease or difficulty of
learning problems?
Machine Learning?
•  Role of Probability and Statistics: Inference from a sample

•  Role of Computer Science: Efficient algorithms to

–  solve the optimization problem
–  representing and evaluating the model for inference
Definitions and Terminology
•  Example:
–  an object or instance in data used
•  Features (denoted by X):
–  the set of attributes, often represented as a vector, associated to an
example, e.g., height and weight for gender prediction
•  Labels (Y):
–  in classification, category associated to an object
–  in regression, real-valued numbers
Definitions and Terminology
•  Training set: Data used for training the learning
algorithm

•  Test data: Data used for evaluating the learned

model
Feature Extraction and Mapping
•  How you represent your data as a set of features is
very! important.
Example: Recognizing Flowers
Three types of Iris Flowers

Setosa Versicolor Virginica

Flower Recognition: Features

Setosa

Versicolor

Virginica
Example: A Digit Recognizer
•  Imagine you are asked to write a program to recognize hand
written digits

•  Now let’s use the terminology to define the problem

Terminology Example
Classification Task:
Given a hand written digit classify it into {0,1,..,9}

3
Digit Recognizer
•  Each hand written digit image is an example
(instance, or object) in your data
Labels (Space of outputs)
•  Labels are all digits:

Label of this example: 6

Features (Space of Inputs)
We could divide the image in 16x16 pixels and encode each
pixel intensity as a numeric number.
Feature vector
for the example
0
.
.
Map to .
features 1.2
3.2
4.5
1.2
.
.
0
256 x 1
16 x16 grid
Feature Mapping
•  How you represent your data as features is very!
important.
Training Set/Test Set
Training set Test set

+ their labels + their labels

Should not be overlapping

Terminology Example: Digit Recognizer

Training set Test set

+ their labels + their labels

•  Built the classifier using •  Predict the label of unseen

the features and the examples’ features.
labels in the training set •  Compare your predictions results
with the actual labels.
How would you classify ?

Predict its
label?
Nearest Neighbor Classifiers
•  Basic idea:
–  If it walks like a duck, quacks like a duck, then it’s
probably a duck

compute
distance test
sample

training choose k of the

samples “nearest” samples
Nearest Neighbor Classifiers
Unknown record
k-Nearest Neighbor (kNN)
•  Assumes all instances are points in n-dimensional
space

•  A distance measure is needed to determine the

“closeness” of instances

•  Classify an instance by finding its nearest neighbors

and picking the most popular class among the
neighbors
1- Nearest Neighbor Classifier Algorithm
•  The training data D = {(x1, y1), (x2, y2), …, (xn, yn)}
•  Learning algorithm:
–  Store training examples

•  Prediction algorithm:
–  To classify a new example xnew by finding the training
example (xi, yi) that is nearest to xnew

–  Guess the class ynew = yi

k-NN Classifier
•  To classify a new input vector x, examine the k-
closest training data points to x and assign the
object to the most frequently occurring class

X1
k-NN Classifier
k-NN Classifier
1-Nearest Neighbor Classifier (k = 1)
3-Nearest Neighbor Classifier (k=3)
5-Nearest Neighbor Classifier (k=5)
Example: Hand Written Digits
•  16x16 bitmaps
•  8-bit grayscale
•  Euclidean distance
over pixels
Example: Hand Written Digits
•  16x16 bitmaps
•  8-bit grayscale
•  Euclidean distance
over pixels
Example: Hand Written Digits
•  16x16 bitmaps
•  8-bit grayscale
•  Euclidean distance
over pixels

Accuracy: 7-NN ~95.2%

SVM ~95.8%
Humans ~97.5%
Performance Measures
•  For binary classification:
–  True positives
–  True negatives
–  False positives
–  False negatives
•  Precision (selectivity)
–  a/(a+c)
•  Recall (sensitivity, TPR)
–  a/(a+b)
•  Specificity
–  d/(c+d)
•  FPR?
•  There is often a trade-off between precision and recall
–  Receiver Operating Characteristic (ROC) curves are used to
accurately assess performance in relation to this trade-off
Performance Measures

Source: wikipedia
Relationship to Modeling
Functional Representation
•  Variable X is related to variable Y, and we would like to learn
this relationship from the data
•  Example: X = [weight, blood glucose, . . .] and Y is whether
the person will have diabetes or not.
•  We assume there is a relationship between X and Y:
–  it is less likely to see certain X co-occur with “low risk” and unlikely to
see some other X co-occur with “high risk".

•  This relationship is encapsulated by P(X, Y)

Formulation of Learning
Multitude of Learning Frameworks
•  Presence/absence of labeled data
Multitude of Learning Frameworks
•  Type of Labels:
Multitude of Learning Frameworks
•  Problems also differ in the protocol for obtaining
data
–  Passive Learning
–  Active Learning
•  and in assumptions in data
–  Batch
–  Online
Examples of Regression Problems
•  Predict tomorrow’s stock market price given current
market conditions and other possible side information.
•  Predict the age of a viewer watching a given video on
YouTube.
•  Predict the location in 3D space of a robot arm end
effector, given control signals (torques) sent to its
various motors.
•  Predict the amount of prostate specific antigen (PSA) in
the body as a function of a number of different clinical
measurements.
•  Predict the temperature at any location inside a building
using weather data, time, door sensors, etc.
Linear Regression
Residual
error

Outcome
variable
(real-valued Coefficient Feature
labels) vector vector

–  Optimization problem: Given instances of x and y,

compute the coefficient vector that will minimize the
residual error
–  Common assumption: Residual error has a Gaussian
(normal) distribution
What Can Go Wrong?
Overfitting

14-degree polynomial 20-degree polynomial

•  Which model is more likely to be generalizable?

–  If we try to fit the model to the training data too accurately, we
will end up with a too complex model that considers every bit of
variation in the data as information
•  Occam’s Razor
Overfitting Example: k-NN

MAP Estimate for K=1 MAP Estimate for K=5

•  What happens if we grow K

Training Data:
further?
•  How about when K=N (number
of samples)?
Overfitting vs. Underfitting
Performance of K-NN on training and test samples as a function of K:

To assess overfitting,
reserve a portion of
these as test samples

Misclassification rate:
Bias and Variance
Do not confuse this
Model Selection K with the K in K-NN!

•  K-fold Cross-Validation
–  Randomly partition training samples
into K groups
–  Repeat for i=1 to K:
•  Reserve the ith group as the test set
•  Train on the remaining (K-1)N
samples
–  Results in one prediction per
sample
•  Use a measure of classification •  K = 5 is usually a good
accuracy to assess performance choice
–  Repeat the above procedure o  K=N => Leave-One-
multiple times Out Cross Validation
•  Report the mean and variance of •  Why repeat multiple
performance figures across these
times?
multiple runs
•  Why variance?
Curse of Dimensionality
•  The space grows exponentially with the number of
dimensions (features)
–  If the number of training samples stays constant as the
number of features goes up, the data gets sparser
–  Distance/locality-based classifiers (e.g., k-NN) are
particularly vulnerable to curse of dimensionality
Breaking The Curse
•  Feature Selection
–  Explicitly aim at reducing the number of features by trying
and evaluating models with different combinations of few
features
•  Computationally difficult since the number of combinations is
exponential
–  A parsimonious (less complex) model is always more
desirable for interpretability and generalizability
•  Dimensionality Reduction
–  Pre-process the data to obtain lower-dimensional
representations by discovering latent patterns
•  Principal Component Analysis (PCA)/ Singular Value
Decomposition (SVD)
Dimensionality Reduction via PCA

Principal Component 1

Principal Component 2

Mean
Interpretability vs. Accuracy
•  Data mining: Find interpretable patterns
–  Emphasis on understanding
•  Machine learning: Make accurate predictions
–  Emphasis on utility
More Data - Cleverer Algorithm
•  Lack of sufficient number of examples is usually
the biggest challenge in machine learning
–  Algorithmic sophistication can address this problem only
to a certain extent, since increasing complexity can
result in overfitting
•  The classifier should get better as it sees more
examples
–  If not, can we say it is a good algorithm?
•  Extreme case: Deep Learning (Convolutional
Neural Networks)
–  Works poorly with few samples
–  Works great if you have millions of samples
AI vs ML vs DL
•  Nice blog post by NVIDIA:
•  https://blogs.nvidia.com/blog/2016/07/29/whats-
difference-artificial-intelligence-machine-learning-
deep-learning-ai/
What will be Covered?
•  This class should give you the basic foundation for applying machine learning
•  Supervised learning
–  Bayesian learning
–  Linear models for regression and classification
–  Instance-based learning
–  Support vector machines
–  Decision Trees
–  Ensemble models
–  Deep Learning
• Unsupervised learning
–  Clustering
–  Dimensionality reduction
•  Model selection and evaluation
•  Sequential Models
–  Hidden Markov Models
•  Additional topics (if time permits)

CCNA 200-301 Official Cert Guide, Volume 2-2
No ratings yet
CCNA 200-301 Official Cert Guide, Volume 2-2
3 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
No ratings yet
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
15 pages
National Ecotourism Strategy and Action Plan 2013-2022
100% (3)
National Ecotourism Strategy and Action Plan 2013-2022
85 pages
AML All Merged PDF Class 1 To 8
No ratings yet
AML All Merged PDF Class 1 To 8
423 pages
Presentation of AI ML Session 1
No ratings yet
Presentation of AI ML Session 1
131 pages
L01-intro-clustering
No ratings yet
L01-intro-clustering
96 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Lecture1 - Introduction To Machine Learning
No ratings yet
Lecture1 - Introduction To Machine Learning
39 pages
Fintech ML Using Azure
No ratings yet
Fintech ML Using Azure
51 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
ppt4dl
No ratings yet
ppt4dl
81 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
03 Machine Learning Overview
No ratings yet
03 Machine Learning Overview
24 pages
ML Merged
No ratings yet
ML Merged
433 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
49 pages
4.1 Machine Learning Basics
No ratings yet
4.1 Machine Learning Basics
26 pages
AI321: Theoretical Foundations of Machine Learning: Dr. Motaz El-Saban
No ratings yet
AI321: Theoretical Foundations of Machine Learning: Dr. Motaz El-Saban
44 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Image Classification Using Backpropagation Algorithm (Presentation)
No ratings yet
Image Classification Using Backpropagation Algorithm (Presentation)
23 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Thinklance Data Science (1)
No ratings yet
Thinklance Data Science (1)
14 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Computational Intelligence: (Introduction To Machine Learning)
No ratings yet
Computational Intelligence: (Introduction To Machine Learning)
55 pages
Machine Learning in Medical Health Care
100% (1)
Machine Learning in Medical Health Care
47 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
23 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Day 2
No ratings yet
Day 2
58 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
AI-with-ICA-18092024-074806pm
No ratings yet
AI-with-ICA-18092024-074806pm
36 pages
I2ml Chap1 v1 1
No ratings yet
I2ml Chap1 v1 1
14 pages
ML 01
No ratings yet
ML 01
44 pages
complete ml (1)
No ratings yet
complete ml (1)
325 pages
2 Main 2nd Lecture
No ratings yet
2 Main 2nd Lecture
23 pages
copy
No ratings yet
copy
1 page
Lecture 01 - Machine Learning Basics Revision
No ratings yet
Lecture 01 - Machine Learning Basics Revision
80 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
46 pages
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
No ratings yet
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
12 pages
Discussion No 4 Pattern Recognition: Group 3
No ratings yet
Discussion No 4 Pattern Recognition: Group 3
20 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
ML1 Foundations
No ratings yet
ML1 Foundations
39 pages
data Science
No ratings yet
data Science
3 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Building A ML System
No ratings yet
Building A ML System
42 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
copy
No ratings yet
copy
1 page
Machine Learning CNN
No ratings yet
Machine Learning CNN
28 pages
ML Overview
No ratings yet
ML Overview
26 pages
Comp Vis Week 3
No ratings yet
Comp Vis Week 3
44 pages
1
No ratings yet
1
42 pages
Lec 2 Basics of machine learning (1)
No ratings yet
Lec 2 Basics of machine learning (1)
35 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
PFP Notes
No ratings yet
PFP Notes
18 pages
Financial Statement With Adjustments
No ratings yet
Financial Statement With Adjustments
3 pages
MNR School OF Excellence: "Parking Management System"
No ratings yet
MNR School OF Excellence: "Parking Management System"
23 pages
1
No ratings yet
1
3 pages
Positive Negative: Vocabulary List-Personality Traits
No ratings yet
Positive Negative: Vocabulary List-Personality Traits
4 pages
Global Knowledge Management Strategies: Kevin Desouza, Roberto Evaristo
No ratings yet
Global Knowledge Management Strategies: Kevin Desouza, Roberto Evaristo
6 pages
Mil S 7742D
No ratings yet
Mil S 7742D
17 pages
2 Linear Programming (LP) Problem - Formulations
No ratings yet
2 Linear Programming (LP) Problem - Formulations
12 pages
Memo DSPC 2024
No ratings yet
Memo DSPC 2024
16 pages
Ta25du2.4 Ta25du3.1 Ta25du32 Ta25du11 Ta25du14 Ta25du4.0 Ta25du1.8 Ta25du25
No ratings yet
Ta25du2.4 Ta25du3.1 Ta25du32 Ta25du11 Ta25du14 Ta25du4.0 Ta25du1.8 Ta25du25
8 pages
Datasheet - HK p9nc60 41128 PDF
No ratings yet
Datasheet - HK p9nc60 41128 PDF
9 pages
Mismanagement Final
No ratings yet
Mismanagement Final
20 pages
Database Lab Manual
No ratings yet
Database Lab Manual
90 pages
Guy Wire Hardware For Set # 1 at Elev 75.00 FT
No ratings yet
Guy Wire Hardware For Set # 1 at Elev 75.00 FT
1 page
The Narcotic Drugs and Psychotropic Substances NDPS Act, 1985
No ratings yet
The Narcotic Drugs and Psychotropic Substances NDPS Act, 1985
57 pages
Group 3 MKT330
No ratings yet
Group 3 MKT330
28 pages
Vehicle Quotation
No ratings yet
Vehicle Quotation
1 page
CMMS Technical Object and Preventative Maintenance Form
No ratings yet
CMMS Technical Object and Preventative Maintenance Form
7 pages
What Is Fact?: A Fact Is A Collection of Related Data Items, Each Fact Typically Represents A Business Item, A
No ratings yet
What Is Fact?: A Fact Is A Collection of Related Data Items, Each Fact Typically Represents A Business Item, A
28 pages
Wage Order RBV 21
83% (6)
Wage Order RBV 21
3 pages
Administering Ear Medication
No ratings yet
Administering Ear Medication
3 pages
Lab (Feed Formulation)
No ratings yet
Lab (Feed Formulation)
4 pages
1.bac 2102 Microeconomics Paper A
No ratings yet
1.bac 2102 Microeconomics Paper A
3 pages
List of Securities
No ratings yet
List of Securities
1,335 pages
Control Statement in python
No ratings yet
Control Statement in python
8 pages
LV Home Ipid
No ratings yet
LV Home Ipid
2 pages
25 Ronquillo-vs-CA
No ratings yet
25 Ronquillo-vs-CA
1 page