0% found this document useful (0 votes)

12 views

Week4_LearningII

Uploaded by

albertadi412

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Week4_LearningII

Uploaded by

albertadi412

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

CSCI218: Foundations

of Artificial Intelligence
Classical stats/ML: Minimize loss function
§ Which hypothesis space H to choose?
§ E.g., linear combinations of features: hw(x) = wTx
§ How to measure degree of fit?
§ Loss function, e.g., squared error Σj (yj – wTx)2
§ How to trade off degree of fit vs. complexity?
§ Regularization: complexity penalty, e.g., ||w||2
§ How do we find a good h?
§ Optimization (closed-form, numerical); discrete search
§ How do we know if a good h will predict well?
§ Try it and see (cross-validation, bootstrap, etc.)

2
Deep Learning/Neural Network

Image Classification
Very loose inspiration: Human neurons

Axonal arborization

Axon from another cell

Synapse
Dendrite Axon

Nucleus

Synapses

Cell body or Soma

Simple model of a neuron (McCulloch & Pitts, 1943)
Bias Weight
a0 = 1 aj = g(inj)
w0,j
g
wi,j inj
ai
Σ aj

Input Input Activation Output

Links Function Function Output Links

§ Inputs ai come from the output of node i to this node j (or from “outside”)
§ Each input link has a weight wi,j
§ There is an additional fixed input a0 with bias weight w0,j
§ The total input is inj = Si wi,j ai
§ The output is aj = g(inj) = g(Si wi,j ai) = g(w.a)
Activation functions g
g(ini) g(ini)

+1 +1

ini ini
(a)
Threshold (b)1/(1+e-x)
Sigmoid
Reminder: Linear Classifiers

▪ Inputs are feature values

▪ Each feature has a weight
▪ Sum is the activation

▪ If the activation is: f1

w1
▪ Positive, output +1 w2
▪ Negative, output -1
f2
w3 Σ >0?
f3
How to get probabilistic decisions?

If very positive, want probability going to 1

If very negative, want probability going to 0

Sigmoid function
Best w?
Maximum likelihood estimation:

with:

= Logistic Regression
Multiclass Logistic Regression
Multi-class linear classification
A weight vector for each class:

Score (activation) of a class y:

Prediction w/highest score wins:

How to make the scores into probabilities?

original activations softmax activations

Best w?
Maximum likelihood estimation:

with:

= Multi-Class Logistic Regression

Optimization

i.e., how do we solve:

Hill Climbing
A simple, general idea
Start wherever
Repeat: move to the best neighboring state
If no neighbors better than current, quit

What’s particularly tricky when hill-climbing for multiclass

logistic regression?
• Optimization over a continuous space
• Infinitely many neighbors!
• How to do this efficiently?
1-D Optimization

Could evaluate and

Then step in best direction

Or, evaluate derivative:

Tells which direction to step into
2-D Optimization

Source: offconvex.org
Gradient Ascent
Perform update in uphill direction for each coordinate
The steeper the slope (i.e. the higher the derivative) the bigger the step
for that coordinate
E.g., consider:

Updates: ▪ Updates in vector notation:

with: = gradient
Steepest Descent
o Idea:
o Start somewhere
o Repeat: Take a step in the steepest descent direction

Figure source: Mathworks

Steepest Direction
o Steepest Direction = direction of the gradient

2 @g
3
@w1
6 @g7
6 @w2
7
rg = 6 7
4 ··· 5
@g
@wn
Optimization Procedure: Gradient Ascent

init
for iter = 1, 2, …

▪ : learning rate --- hyperparameter that needs to be chosen

carefully
Batch Gradient Ascent on the Log Likelihood Objective

init
for iter = 1, 2, …
Stochastic Gradient Ascent on the Log Likelihood Objective

Observation: once gradient on one training example has been

computed, might as well incorporate before computing next one

init
for iter = 1, 2, …
pick random j
Mini-Batch Gradient Ascent on the Log Likelihood Objective

Observation: gradient over small set of training examples (=mini-batch)

can be computed in parallel, might as well do that instead of a single one

init
for iter = 1, 2, …
pick random subset of training examples J
Neural Networks
Multi-class Logistic Regression
= special case of neural network (single layer, no hidden layer)
f1(x)

z1 s
f2(x) o
f
z2 t
f3(x)
m
a
x
… z3

fK(x)
Multi-layer Perceptron

x1
s
x2 o
f
… t
x3 m
a
… … … … x
…

g = nonlinear activation function

Multi-layer Perceptron
Common Activation Functions

[source: MIT 6.S191 introtodeeplearning.com]

Multi-layer Perceptron
Training the MLP neural network is just like logistic regression:

just w tends to be a larger vector

just run gradient ascent è Back-propagation algorithm

Neural Networks Properties
Theorem (Universal Function Approximators). A two-layer
neural network with a sufficient number of neurons can
approximate any continuous function to any desired accuracy.

Practical considerations
Can deal with more complex, nonlinear classification & regression
Large number of neurons and weights
Danger for overfitting
Deep Learning Model

Neural network as
General computation graph

Krizhevsky, Suskever, Hinton, 2012

Deep Learning Model
Deep Learning Model
§ We need good features!

Feature Extraction Classification “Panda”?

Prior Knowledge,
Experience

Pose Occlusion Multiple Inter-class

objects similarity

Image courtesy of M. Ranzato

Deep Learning Model

§ Directly learn features representations from data.

§ Joint learn feature representation and classifier.

More abstract representation

Low-level Mid-level High-level

Features Features Features
Classifier “Panda”?

Deep Learning: train layers of features so that classifier works well.

Image courtesy of M. Ranzato

Deep Learning Model
Have we been here before?
ØYes.
• Basic ideas common to past neural networks research
• Standard machine learning strategies still relevant.
ØNo.
Today’s Deep Learning

Computational
Large-scale Data New Algorithms
Power
Deep Learning Model
Convolutional Neural Networks (CNNs)
§ A special multi-stage architecture inspired by visual system
§ Higher stages compute more global, more invariant features
Deep Learning Model

https://www.datasciencecentral.com/lenet-5-a-classic-cnn-architecture/
Different Neural Network Architectures
§ Exploration of different neural network architectures
§ ResNet: residual networks
§ Networks with attention
§ Transformer networks
§ Neural network architecture search
§ Really large models
§ GPT2, GPT3
§ CLIP

37
Acknowledgement

The lecture slides are based on the materials from ai.Berkey.edu

Thank you. Questions?

cs188-sp24-note22
No ratings yet
cs188-sp24-note22
8 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
NN Theory
No ratings yet
NN Theory
138 pages
unit 2_class
No ratings yet
unit 2_class
16 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
9 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Chapter 2 - 2 Shallow neural network 2_2
No ratings yet
Chapter 2 - 2 Shallow neural network 2_2
34 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Inbound 8392301798635648784
No ratings yet
Inbound 8392301798635648784
43 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
lec05
No ratings yet
lec05
46 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
16-dl-1 - converted
No ratings yet
16-dl-1 - converted
9 pages
Lec 21
No ratings yet
Lec 21
34 pages
02-03-Warming-up and Data and Features
No ratings yet
02-03-Warming-up and Data and Features
22 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Neuralnetworks 1
No ratings yet
Neuralnetworks 1
65 pages
Neural Networks: Artificial Intelligence: Representation and Problem Solving
No ratings yet
Neural Networks: Artificial Intelligence: Representation and Problem Solving
19 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
deep learning
No ratings yet
deep learning
11 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
NNDL
No ratings yet
NNDL
96 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Lesson 7.0 Supervised Learning With Neural Networks (1)
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks (1)
22 pages
To Machine Learning: Isabelle Guyon
No ratings yet
To Machine Learning: Isabelle Guyon
40 pages
ML
No ratings yet
ML
9 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Artificial life: Random walk
From Everand
Artificial life: Random walk
Mietek Szyszkowicz
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Week5_Computer_Vision
No ratings yet
Week5_Computer_Vision
58 pages
Week1_Lecture1
No ratings yet
Week1_Lecture1
40 pages
Week3_LearningI
No ratings yet
Week3_LearningI
48 pages
Week2_Lecture
No ratings yet
Week2_Lecture
39 pages
EnjoyAlgorithms - Note - Follow
No ratings yet
EnjoyAlgorithms - Note - Follow
6 pages
Khosla 2020
No ratings yet
Khosla 2020
7 pages
B.N.M. Institute of Technology: Visvesvaraya Technological University
No ratings yet
B.N.M. Institute of Technology: Visvesvaraya Technological University
23 pages
AI-ML PPT
No ratings yet
AI-ML PPT
15 pages
21-3SSREB-12023-2024AIForEveryoneFundamentals-CompleteE-Book
No ratings yet
21-3SSREB-12023-2024AIForEveryoneFundamentals-CompleteE-Book
185 pages
Curriculum CVDL Master Program Updated
No ratings yet
Curriculum CVDL Master Program Updated
42 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
17 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Neuro-Symbolic-AI-for-Medical-Image-Diagnosis
No ratings yet
Neuro-Symbolic-AI-for-Medical-Image-Diagnosis
10 pages
Artificial Intelligence Mini Project
No ratings yet
Artificial Intelligence Mini Project
5 pages
2023 Question Paper
No ratings yet
2023 Question Paper
2 pages
Network Anomaly Detection Using LSTMBased Autoencoder
No ratings yet
Network Anomaly Detection Using LSTMBased Autoencoder
10 pages
Deep Learning Review and Discussion of Its Future
No ratings yet
Deep Learning Review and Discussion of Its Future
7 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
Deep Learning Parameters v12.1
No ratings yet
Deep Learning Parameters v12.1
13 pages
Advanced Deep Learning Practical File
No ratings yet
Advanced Deep Learning Practical File
29 pages
What Is Machine Learning-UNIT III
No ratings yet
What Is Machine Learning-UNIT III
12 pages
Machine Learning Techniques Quantum
No ratings yet
Machine Learning Techniques Quantum
159 pages
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
1 s2.0 S0888327019308088 Main
No ratings yet
1 s2.0 S0888327019308088 Main
39 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
3 pages
Combining Pattern Classifiers Methods and Algorithms PDF
No ratings yet
Combining Pattern Classifiers Methods and Algorithms PDF
2 pages
Hierarchical Graph Neural Networks
No ratings yet
Hierarchical Graph Neural Networks
14 pages
Ann-Unit Ii
No ratings yet
Ann-Unit Ii
21 pages
Fruits & Vegetable Classification and Calories Measurement System
No ratings yet
Fruits & Vegetable Classification and Calories Measurement System
2 pages
UGRD-AI6100 AI - PRELIM LAB EXAM - Attempt 2
No ratings yet
UGRD-AI6100 AI - PRELIM LAB EXAM - Attempt 2
13 pages
The Combination of Metal Oxides As Oxide Layers For RRAM and Artificial Intelligence
No ratings yet
The Combination of Metal Oxides As Oxide Layers For RRAM and Artificial Intelligence
7 pages
Lecture 1
No ratings yet
Lecture 1
82 pages
NLP m4
No ratings yet
NLP m4
97 pages
Pre-Processing: System Architecture
100% (2)
Pre-Processing: System Architecture
5 pages