What Is Learning?: CS 391L: Machine Learning
What Is Learning?: CS 391L: Machine Learning
Raymond J. Mooney
University of Texas at Austin
1 2
3 4
5 6
1
Why Study Machine Learning? Why Study Machine Learning?
Cognitive Science The Time is Ripe
• Computational studies of learning may help us • Many basic effective and efficient
understand learning in humans and other
biological organisms. algorithms available.
– Hebbian neural learning • Large amounts of on-line data available.
• “Neurons that fire together, wire together.”
– Human’s relative difficulty of learning disjunctive • Large amounts of computational resources
concepts vs. conjunctive ones.
– Power law of practice available.
log(perf. time)
Environment/
Experience Knowledge
Performance
Element 11 12
2
Training Experience Source of Training Data
• Direct experience: Given sample input and output • Provided random examples outside of the learner’s
pairs for a useful target function. control.
– Checker boards labeled with the correct move, e.g. – Negative examples available or only positive?
extracted from record of expert play • Good training examples selected by a “benevolent
• Indirect experience: Given feedback which is not teacher.”
– “Near miss” examples
direct I/O pairs for a useful target function.
– Potentially arbitrary sequences of game moves and their
• Learner can query an oracle about class of an
final game results. unlabeled example in the environment.
• Credit/Blame Assignment Problem: How to assign • Learner can construct an arbitrary example and
query an oracle for its label.
credit blame to individual moves given only
indirect feedback? • Learner can design and run experiments directly
in the environment without any human guidance.
13 14
• Generally assume that the training and test • What function is to be learned and how will it be
used by the performance system?
examples are independently drawn from the
• For checkers, assume we are given a function for
same overall distribution of data. generating the legal moves for a given board position
– IID: Independently and identically distributed and want to decide the best move.
– Could learn a function:
• If examples are not independent, requires
ChooseMove(board, legal-moves) → best-move
collective classification. – Or could learn an evaluation function, V(board) → R,
• If test distribution is different, requires that gives each board position a score for how favorable it
is. V can be used to pick a move by applying each legal
transfer learning. move, scoring the resulting board position, and choosing
the move that results in the highest scoring board position.
15 16
17 18
3
Representing the Target Function Linear Function for Representing V(b)
• Target function can be represented in many ways: • In checkers, use a linear approximation of the
lookup table, symbolic rules, numerical function, evaluation function.
)
neural network. V (b) = w0 + w1 ⋅ bp(b) + w2 ⋅ rp(b) + w3 ⋅ bk (b) + w4 ⋅ rk (b) + w5 ⋅ bt (b) + w6 ⋅ rt (b)
• There is a trade-off between the expressiveness of – bp(b): number of black pieces on board b
a representation and the ease of learning. – rp(b): number of red pieces on board b
• The more expressive a representation, the better it – bk(b): number of black kings on board b
will be at approximating an arbitrary function; – rk(b): number of red kings on board b
however, the more examples will be needed to – bt(b): number of black pieces threatened (i.e. which can
learn an accurate function. be immediately taken by red on its next turn)
– rt(b): number of red pieces threatened
19 20
• Direct supervision may be available for the • Estimate training values for intermediate (non-
target function. terminal) board positions by the estimated value of
their successor in an actual game trace.
– < <bp=3,rp=0,bk=1,rk=0,bt=0,rt=0>, 100> )
(win for black) Vtrain (b) = V (successor( b))
where successor(b) is the next board position
• With indirect feedback, training values can
where it is the program’s move in actual play.
be estimated using temporal difference
• Values towards the end of the game are initially
learning (used in reinforcement learning
more accurate and continued training slowly
where supervision is delayed reward). “backs up” accurate values to earlier board
positions.
21 22
∑ )
[Vtrain (b) − V (b)]2 wi = wi + c ⋅ f i ⋅ error (b)
E = b∈B for some small constant (learning rate) c
B
23 24
4
LMS Discussion Lessons Learned about Learning
• Intuitively, LMS executes the following rules: • Learning can be viewed as using direct or indirect
– If the output for an example is correct, make no change. experience to approximate a chosen target
– If the output is too high, lower the weights proportional function.
to the values of their corresponding features, so the • Function approximation can be viewed as a search
overall output decreases
through a space of hypotheses (representations of
– If the output is too low, increase the weights
functions) for one that best fits a set of training
proportional to the values of their corresponding
features, so the overall output increases. data.
• Under the proper weak assumptions, LMS can be • Different learning methods assume different
proven to eventetually converge to a set of weights hypothesis spaces (representation languages)
that minimizes the mean squared error. and/or employ different search techniques.
25 26
27 28
29 30
5
History of Machine Learning (cont.) History of Machine Learning (cont.)
• 1980s: • 2000s
– Advanced decision tree and rule learning – Support vector machines
– Explanation-based Learning (EBL) – Kernel methods
– Learning and planning and problem solving
– Graphical models
– Utility problem
– Analogy – Statistical relational learning
– Cognitive architectures – Transfer learning
– Resurgence of neural networks (connectionism, backpropagation) – Sequence labeling
– Valiant’s PAC Learning Theory – Collective classification and structured outputs
– Focus on experimental methodology – Computer Systems Applications
• 1990s • Compilers
– Data mining • Debugging
– Adaptive software agents and web applications • Graphics
– Text learning • Security (intrusion, virus, and worm detection)
– Reinforcement learning (RL) – Email management
– Inductive Logic Programming (ILP) – Personalized assistants that learn
– Ensembles: Bagging, Boosting, and Stacking – Learning in robotics and vision
– Bayes Net learning
31 32