This document provides an introduction to machine learning. It discusses how machine learning allows computers to learn from experience to improve their performance on tasks. Supervised learning is described, where the goal is to learn a function that maps inputs to outputs from a labeled dataset. Cross-validation techniques like the test set method, leave-one-out cross-validation, and k-fold cross-validation are introduced to evaluate model performance without overfitting. Applications of machine learning like medical diagnosis, recommendation systems, and autonomous driving are briefly outlined.
Here are the key calculations:
1) Probability that persons p and q will be at the same hotel on a given day d is 1/100 × 1/100 × 10-5 = 10-9, since there are 100 hotels and each person stays in a hotel with probability 10-5 on any given day.
2) Probability that p and q will be at the same hotel on given days d1 and d2 is (10-9) × (10-9) = 10-18, since the events are independent.
Machine learning helps predict behavior and recognize patterns that humans cannot by learning from data without relying on programmed rules. It is an algorithmic approach that differs from statistical modeling which formalizes relationships through mathematical equations. Machine learning is a part of the broader field of artificial intelligence which aims to develop systems that can act and respond intelligently like humans. The machine learning workflow involves collecting and preprocessing data, selecting algorithms, training models, and evaluating performance. Common machine learning algorithms include supervised learning, unsupervised learning, reinforcement learning, and deep learning. Popular tools for machine learning include Python, R, TensorFlow, and Spark.
Machine learning is a method of data analysis that uses algorithms to iteratively learn from data without being explicitly programmed. It allows computers to find hidden insights in data and become better at tasks via experience. Machine learning has many practical applications and is important due to growing data availability, cheaper and more powerful computation, and affordable storage. It is used in fields like finance, healthcare, marketing and transportation. The main approaches are supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Each has real-world examples like loan prediction, market basket analysis, webpage classification, and marketing campaign optimization.
Active learning is a machine learning technique where the learner is able to interactively query the oracle (e.g. a human) to obtain labels for new data points in an effort to learn more accurately from fewer labeled examples. The learner selects the most informative samples to be labeled by the oracle, such as samples closest to the decision boundary or where models disagree most. This allows the learner to minimize the number of labeled samples needed, thus reducing the cost of training an accurate model. Suggested improvements include querying batches of samples instead of single samples and accounting for varying labeling costs.
Machine learning involves programming computers to optimize performance using example data or past experience. It is used when human expertise does not exist, humans cannot explain their expertise, solutions change over time, or solutions need to be adapted to particular cases. Learning builds general models from data to approximate real-world examples. There are several types of machine learning including supervised learning (classification, regression), unsupervised learning (clustering), and reinforcement learning. Machine learning has applications in many domains including retail, finance, manufacturing, medicine, web mining, and more.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
Machine learning can be applied across many domains such as business, entertainment, medicine, and software engineering. The document outlines the machine learning process which includes data collection, feature extraction, model learning, and evaluation. It also provides examples of machine learning applications in various domains, such as using decision trees to make credit decisions in business, classifying emotions in music for playlist generation in entertainment, and detecting heart murmurs from audio data in medicine.
Here are the key calculations:
1) Probability that persons p and q will be at the same hotel on a given day d is 1/100 × 1/100 × 10-5 = 10-9, since there are 100 hotels and each person stays in a hotel with probability 10-5 on any given day.
2) Probability that p and q will be at the same hotel on given days d1 and d2 is (10-9) × (10-9) = 10-18, since the events are independent.
Machine learning helps predict behavior and recognize patterns that humans cannot by learning from data without relying on programmed rules. It is an algorithmic approach that differs from statistical modeling which formalizes relationships through mathematical equations. Machine learning is a part of the broader field of artificial intelligence which aims to develop systems that can act and respond intelligently like humans. The machine learning workflow involves collecting and preprocessing data, selecting algorithms, training models, and evaluating performance. Common machine learning algorithms include supervised learning, unsupervised learning, reinforcement learning, and deep learning. Popular tools for machine learning include Python, R, TensorFlow, and Spark.
Machine learning is a method of data analysis that uses algorithms to iteratively learn from data without being explicitly programmed. It allows computers to find hidden insights in data and become better at tasks via experience. Machine learning has many practical applications and is important due to growing data availability, cheaper and more powerful computation, and affordable storage. It is used in fields like finance, healthcare, marketing and transportation. The main approaches are supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Each has real-world examples like loan prediction, market basket analysis, webpage classification, and marketing campaign optimization.
Active learning is a machine learning technique where the learner is able to interactively query the oracle (e.g. a human) to obtain labels for new data points in an effort to learn more accurately from fewer labeled examples. The learner selects the most informative samples to be labeled by the oracle, such as samples closest to the decision boundary or where models disagree most. This allows the learner to minimize the number of labeled samples needed, thus reducing the cost of training an accurate model. Suggested improvements include querying batches of samples instead of single samples and accounting for varying labeling costs.
Machine learning involves programming computers to optimize performance using example data or past experience. It is used when human expertise does not exist, humans cannot explain their expertise, solutions change over time, or solutions need to be adapted to particular cases. Learning builds general models from data to approximate real-world examples. There are several types of machine learning including supervised learning (classification, regression), unsupervised learning (clustering), and reinforcement learning. Machine learning has applications in many domains including retail, finance, manufacturing, medicine, web mining, and more.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
Machine learning can be applied across many domains such as business, entertainment, medicine, and software engineering. The document outlines the machine learning process which includes data collection, feature extraction, model learning, and evaluation. It also provides examples of machine learning applications in various domains, such as using decision trees to make credit decisions in business, classifying emotions in music for playlist generation in entertainment, and detecting heart murmurs from audio data in medicine.
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
This document provides an overview of machine learning, including:
- Machine learning allows computers to learn from data without being explicitly programmed, through processes like analyzing data, training models on past data, and making predictions.
- The main types of machine learning are supervised learning, which uses labeled training data to predict outputs, and unsupervised learning, which finds patterns in unlabeled data.
- Common supervised learning tasks include classification (like spam filtering) and regression (like weather prediction). Unsupervised learning includes clustering, like customer segmentation, and association, like market basket analysis.
- Supervised and unsupervised learning are used in many areas like risk assessment, image classification, fraud detection, customer analytics, and more
Intro/Overview on Machine Learning PresentationAnkit Gupta
This document provides an overview of a presentation on machine learning given at Gurukul Kangri University in 2017. It defines machine learning as a field that allows computers to learn without being explicitly programmed. It discusses different machine learning algorithms including supervised learning, unsupervised learning, and semi-supervised learning. Examples of applications of machine learning discussed include data mining, natural language processing, image recognition, and expert systems. The document also contrasts artificial intelligence, machine learning, and deep learning.
The document summarizes key concepts in machine learning, including defining learning, types of learning (induction vs discovery, guided learning vs learning from raw data, etc.), generalisation and specialisation, and some simple learning algorithms like Find-S and the candidate elimination algorithm. It discusses how learning can be viewed as searching a generalisation hierarchy to find a hypothesis that covers the examples. The candidate elimination algorithm maintains the version space - the set of hypotheses consistent with the training examples - by updating the general and specific boundaries as new examples are processed.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
This document provides an overview of machine learning. It begins with an introduction and definitions, explaining that machine learning allows computers to learn without being explicitly programmed by exploring algorithms that can learn from data. The document then discusses the different types of machine learning problems including supervised learning, unsupervised learning, and reinforcement learning. It provides examples and applications of each type. The document also covers popular machine learning techniques like decision trees, artificial neural networks, and frameworks/tools used for machine learning.
A PPT which gives a brief introduction on Machine Learning and on the products developed by using Machine Learning Algorithms in them. Gives the introduction by using content and also by using a few images in the slides as part of the explanation. It includes some examples of cool products like Google Cloud Platform, Cozmo (a tiny robot built by using Artificial Intelligence), IBM Watson and many more.
This Machine Learning Algorithms presentation will help you learn you what machine learning is, and the various ways in which you can use machine learning to solve a problem. At the end, you will see a demo on linear regression, logistic regression, decision tree and random forest. This Machine Learning Algorithms presentation is designed for beginners to make them understand how to implement the different Machine Learning Algorithms.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. Real world applications of Machine Learning
2. What is Machine Learning?
3. Processes involved in Machine Learning
4. Type of Machine Learning Algorithms
5. Popular Algorithms with a hands-on demo
- Linear regression
- Logistic regression
- Decision tree and Random forest
- N Nearest neighbor
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
This document provides an overview of machine learning concepts including supervised learning, unsupervised learning, and reinforcement learning. It explains that supervised learning involves learning from labeled examples, unsupervised learning involves categorizing without labels, and reinforcement learning involves learning behaviors to achieve goals through interaction. The document also discusses regression vs classification problems, the learning and testing process, and examples of machine learning applications like customer profiling, face recognition, and handwritten character recognition.
Machine learning and its applications was a gentle introduction to machine learning presented by Dr. Ganesh Neelakanta Iyer. The presentation covered an introduction to machine learning, different types of machine learning problems including classification, regression, and clustering. It also provided examples of applications of machine learning at companies like Facebook, Google, and McDonald's. The presentation concluded with discussing the general machine learning framework and steps involved in working with machine learning problems.
Machine learning works by processing data to discover patterns that can be used to analyze new data. Popular programming languages for machine learning include Python, R, and SQL. There are several types of machine learning including supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and deep learning. Common machine learning tasks involve classification, regression, clustering, dimensionality reduction, and model selection. Machine learning is widely used for applications such as spam filtering, recommendations, speech recognition, and machine translation.
The document provides an overview of various machine learning algorithms and methods. It begins with an introduction to predictive modeling and supervised vs. unsupervised learning. It then describes several supervised learning algorithms in detail including linear regression, K-nearest neighbors (KNN), decision trees, random forest, logistic regression, support vector machines (SVM), and naive Bayes. It also briefly discusses unsupervised learning techniques like clustering and dimensionality reduction methods.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
The presentation provides an overview of machine learning, including its history, definitions, applications and algorithms. It discusses how machine learning systems are trained and tested, and how performance is evaluated. The key points are that machine learning involves computers learning from experience to improve their abilities, it is used in applications that require prediction, classification and pattern detection, and common algorithms include supervised, unsupervised and reinforcement learning.
This document discusses machine learning concepts like supervised and unsupervised learning. It explains that supervised learning uses known inputs and outputs to learn rules while unsupervised learning deals with unknown inputs and outputs. Classification and regression are described as types of supervised learning problems. Classification involves categorizing data into classes while regression predicts continuous, real-valued outputs. Examples of classification and regression problems are provided. Classification models like heuristic, separation, regression and probabilistic models are also mentioned. The document encourages learning more about classification algorithms in upcoming videos.
Machine learning involves developing systems that can learn from data and experience. The document discusses several machine learning techniques including decision tree learning, rule induction, case-based reasoning, supervised and unsupervised learning. It also covers representations, learners, critics and applications of machine learning such as improving search engines and developing intelligent tutoring systems.
This document provides an introduction to machine learning, including:
- It discusses how the human brain learns to classify images and how machine learning systems are programmed to perform similar tasks.
- It provides an example of image classification using machine learning and discusses how machines are trained on sample data and then used to classify new queries.
- It outlines some common applications of machine learning in areas like banking, biomedicine, and computer/internet applications. It also discusses popular machine learning algorithms like Bayes networks, artificial neural networks, PCA, SVM classification, and K-means clustering.
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
This document provides an overview of machine learning, including:
- Machine learning allows computers to learn from data without being explicitly programmed, through processes like analyzing data, training models on past data, and making predictions.
- The main types of machine learning are supervised learning, which uses labeled training data to predict outputs, and unsupervised learning, which finds patterns in unlabeled data.
- Common supervised learning tasks include classification (like spam filtering) and regression (like weather prediction). Unsupervised learning includes clustering, like customer segmentation, and association, like market basket analysis.
- Supervised and unsupervised learning are used in many areas like risk assessment, image classification, fraud detection, customer analytics, and more
Intro/Overview on Machine Learning PresentationAnkit Gupta
This document provides an overview of a presentation on machine learning given at Gurukul Kangri University in 2017. It defines machine learning as a field that allows computers to learn without being explicitly programmed. It discusses different machine learning algorithms including supervised learning, unsupervised learning, and semi-supervised learning. Examples of applications of machine learning discussed include data mining, natural language processing, image recognition, and expert systems. The document also contrasts artificial intelligence, machine learning, and deep learning.
The document summarizes key concepts in machine learning, including defining learning, types of learning (induction vs discovery, guided learning vs learning from raw data, etc.), generalisation and specialisation, and some simple learning algorithms like Find-S and the candidate elimination algorithm. It discusses how learning can be viewed as searching a generalisation hierarchy to find a hypothesis that covers the examples. The candidate elimination algorithm maintains the version space - the set of hypotheses consistent with the training examples - by updating the general and specific boundaries as new examples are processed.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
This document provides an overview of machine learning. It begins with an introduction and definitions, explaining that machine learning allows computers to learn without being explicitly programmed by exploring algorithms that can learn from data. The document then discusses the different types of machine learning problems including supervised learning, unsupervised learning, and reinforcement learning. It provides examples and applications of each type. The document also covers popular machine learning techniques like decision trees, artificial neural networks, and frameworks/tools used for machine learning.
A PPT which gives a brief introduction on Machine Learning and on the products developed by using Machine Learning Algorithms in them. Gives the introduction by using content and also by using a few images in the slides as part of the explanation. It includes some examples of cool products like Google Cloud Platform, Cozmo (a tiny robot built by using Artificial Intelligence), IBM Watson and many more.
This Machine Learning Algorithms presentation will help you learn you what machine learning is, and the various ways in which you can use machine learning to solve a problem. At the end, you will see a demo on linear regression, logistic regression, decision tree and random forest. This Machine Learning Algorithms presentation is designed for beginners to make them understand how to implement the different Machine Learning Algorithms.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. Real world applications of Machine Learning
2. What is Machine Learning?
3. Processes involved in Machine Learning
4. Type of Machine Learning Algorithms
5. Popular Algorithms with a hands-on demo
- Linear regression
- Logistic regression
- Decision tree and Random forest
- N Nearest neighbor
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
This document provides an overview of machine learning concepts including supervised learning, unsupervised learning, and reinforcement learning. It explains that supervised learning involves learning from labeled examples, unsupervised learning involves categorizing without labels, and reinforcement learning involves learning behaviors to achieve goals through interaction. The document also discusses regression vs classification problems, the learning and testing process, and examples of machine learning applications like customer profiling, face recognition, and handwritten character recognition.
Machine learning and its applications was a gentle introduction to machine learning presented by Dr. Ganesh Neelakanta Iyer. The presentation covered an introduction to machine learning, different types of machine learning problems including classification, regression, and clustering. It also provided examples of applications of machine learning at companies like Facebook, Google, and McDonald's. The presentation concluded with discussing the general machine learning framework and steps involved in working with machine learning problems.
Machine learning works by processing data to discover patterns that can be used to analyze new data. Popular programming languages for machine learning include Python, R, and SQL. There are several types of machine learning including supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and deep learning. Common machine learning tasks involve classification, regression, clustering, dimensionality reduction, and model selection. Machine learning is widely used for applications such as spam filtering, recommendations, speech recognition, and machine translation.
The document provides an overview of various machine learning algorithms and methods. It begins with an introduction to predictive modeling and supervised vs. unsupervised learning. It then describes several supervised learning algorithms in detail including linear regression, K-nearest neighbors (KNN), decision trees, random forest, logistic regression, support vector machines (SVM), and naive Bayes. It also briefly discusses unsupervised learning techniques like clustering and dimensionality reduction methods.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
The presentation provides an overview of machine learning, including its history, definitions, applications and algorithms. It discusses how machine learning systems are trained and tested, and how performance is evaluated. The key points are that machine learning involves computers learning from experience to improve their abilities, it is used in applications that require prediction, classification and pattern detection, and common algorithms include supervised, unsupervised and reinforcement learning.
This document discusses machine learning concepts like supervised and unsupervised learning. It explains that supervised learning uses known inputs and outputs to learn rules while unsupervised learning deals with unknown inputs and outputs. Classification and regression are described as types of supervised learning problems. Classification involves categorizing data into classes while regression predicts continuous, real-valued outputs. Examples of classification and regression problems are provided. Classification models like heuristic, separation, regression and probabilistic models are also mentioned. The document encourages learning more about classification algorithms in upcoming videos.
Machine learning involves developing systems that can learn from data and experience. The document discusses several machine learning techniques including decision tree learning, rule induction, case-based reasoning, supervised and unsupervised learning. It also covers representations, learners, critics and applications of machine learning such as improving search engines and developing intelligent tutoring systems.
This document provides an introduction to machine learning, including:
- It discusses how the human brain learns to classify images and how machine learning systems are programmed to perform similar tasks.
- It provides an example of image classification using machine learning and discusses how machines are trained on sample data and then used to classify new queries.
- It outlines some common applications of machine learning in areas like banking, biomedicine, and computer/internet applications. It also discusses popular machine learning algorithms like Bayes networks, artificial neural networks, PCA, SVM classification, and K-means clustering.
This document discusses algorithm-independent machine learning techniques. It introduces concepts like bias and variance, which can quantify how well a learning algorithm matches a problem without depending on a specific algorithm. Methods like cross-validation, bootstrapping, and resampling can be used with different algorithms. While no algorithm is inherently superior, such techniques provide guidance on algorithm use and help integrate multiple classifiers.
Data Science, Machine Learning and Neural NetworksBICA Labs
Lecture briefly overviewing state of the art of Data Science, Machine Learning and Neural Networks. Covers main Artificial Intelligence technologies, Data Science algorithms, Neural network architectures and cloud computing facilities enabling the whole stack.
This document provides an introduction to machine learning. It begins with an agenda that lists topics such as introduction, theory, top 10 algorithms, recommendations, classification with naive Bayes, linear regression, clustering, principal component analysis, MapReduce, and conclusion. It then discusses what big data is and how data is accumulating at tremendous rates from various sources. It explains the volume, variety, and velocity aspects of big data. The document also provides examples of machine learning applications and discusses extracting insights from data using various algorithms. It discusses issues in machine learning like overfitting and underfitting data and the importance of testing algorithms. The document concludes that machine learning has vast potential but is very difficult to realize that potential as it requires strong mathematics skills.
This document provides an introduction to machine learning. It defines machine learning as developing algorithms that allow computers to learn from experience to improve their performance on tasks. The document outlines supervised learning and other learning frameworks. It discusses applications of machine learning such as autonomous vehicles, recommendation systems, and credit risk analysis. The document also provides examples of machine learning applications at the University of Liege including medical diagnosis, gene expression analysis, and patient classification.
This document provides an overview of a Machine Learning course, including:
- The course is taught by Max Welling and includes homework, a project, quizzes, and a final exam.
- Topics covered include classification, neural networks, clustering, reinforcement learning, Bayesian methods, and more.
- Machine learning involves computers learning from data to improve performance and make predictions. It is a subfield of artificial intelligence.
Presentation on Machine Learning and Data Miningbutest
The document discusses the differences between automatic learning/machine learning and data mining. It provides definitions for supervised vs unsupervised learning, what automated induction is, and the base components of data mining. Additionally, it outlines differences in the scientific approach between automatic learning and data mining, as well as differences from an industry perspective, including common data mining techniques used and tips for successful data mining projects.
The document discusses object recognition in computer vision. It begins with an overview of object recognition, describing it as the task of finding and identifying objects in images. It then discusses several specific applications of object recognition, including fingerprint recognition and license plate recognition. Fingerprint recognition involves extracting features called minutiae from fingerprint images, which are ridge endings and bifurcations. License plate recognition uses an ALPR system to segment character images, normalize them, and recognize the characters.
1. Machine learning involves using algorithms to learn from data without being explicitly programmed. It is an interdisciplinary field that draws from statistics, computer science, and many other areas.
2. There are massive amounts of data being generated every day from sources like Google, Facebook, YouTube, and more. This data provides opportunities for machine learning applications.
3. Machine learning tasks can be supervised, involving labeled example data, or unsupervised, involving unlabeled data. Supervised learning predicts labels for new data based on patterns in labeled training data, while unsupervised learning finds hidden patterns in unlabeled data.
This document provides an introduction to data science, including definitions, key concepts, and applications. It discusses what data science is, the differences between data science, big data, and artificial intelligence. It also outlines several applications of data science like internet search, recommendation systems, image/speech recognition, gaming, and price comparison websites. Finally, it discusses the data science life cycle and some popular tools used in data science like Python, NumPy, Pandas, Matplotlib, and Scikit-learn.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
Here are some key terms that are similar to "champagne":
- Sparkling wines
- French champagne
- Cognac
- Rosé
- White wine
- Sparkling wine
- Wine
- Burgundy
- Bordeaux
- Cava
- Prosecco
Some specific champagne brands that are similar terms include Moët, Veuve Clicquot, Dom Pérignon, Taittinger, and Bollinger. Grape varieties used in champagne production like Chardonnay and Pinot Noir could also be considered similar terms.
06-01 Machine Learning and Linear Regression.pptxSaharA84
This document discusses machine learning and linear regression. It provides examples of supervised learning problems like predicting housing prices and classifying cancer as malignant or benign. Unsupervised learning is used to discover patterns in unlabeled data, like grouping customers for market segmentation. Linear regression finds the linear function that best fits some training data to make predictions on new data. It can be extended to nonlinear functions by adding polynomial features. More complex models may overfit the training data and not generalize well to new examples.
This document describes research on bio-inspired active vision systems. It discusses how biological vision differs from traditional computer vision in being active rather than passive. The researchers are developing active vision systems using an evolutionary robotics approach, involving neural networks and genetic algorithms. Previous related work is described, including obstacle avoidance by Mars rovers and koala robots. The document outlines plans to design an active vision system to recognize objects using a dataset of images under different conditions, and accelerate it with GPUs. Results showed the system learned to correctly classify objects over generations.
This document discusses machine intelligence and machine learning. It covers topics such as behavior-based AI vs knowledge-based AI, supervised vs unsupervised learning, classification vs prediction, and decision tree induction for classification. Decision trees are built using an algorithm that selects the attribute that best splits the data at each step to create partitions. Pruning techniques are used to avoid overfitting.
How to implement artificial intelligence solutionsCarlos Toxtli
The document provides an overview of how to implement artificial intelligence solutions. It discusses getting started in AI by either creating new techniques as a scientist or implementing existing techniques as an engineer. It then covers various machine learning algorithms like linear regression, decision trees, random forests, naive bayes, k-nearest neighbors, k-means, and support vector machines. Finally, it introduces deep learning concepts like artificial neural networks, neurons, layers, gradients, optimizers, overfitting, and regularization. The document serves as a guide for implementing both machine learning and deep learning techniques for AI applications.
This document provides an overview and introduction to deep learning. It discusses motivations for deep learning such as its powerful learning capabilities. It then covers deep learning basics like neural networks, neurons, training processes, and gradient descent. It also discusses different network architectures like convolutional neural networks and recurrent neural networks. Finally, it describes various deep learning applications, tools, and key researchers and companies in the field.
Machine learning involves using algorithms to optimize performance using example data or past experience. It is useful when human expertise does not exist, cannot be explained, or needs to adapt over time. The document discusses different types of machine learning including supervised learning techniques like classification and regression as well as unsupervised learning techniques like clustering. It provides examples of applications in various domains and lists resources for datasets, journals, and conferences in the machine learning field.
A 3 sentence summary of the document:
The document discusses deep learning for medical image analysis, focusing on applications in neonatal medical imaging. It provides an overview of deep learning and convolutional neural networks, including examples of their use for tasks like brain tissue segmentation in MRI scans of newborns. The presenter describes their research using deep learning for segmentation and diagnosis of hypoxic ischemic encephalopathy in MRI scans of newborns.
This document provides an overview of deep learning and its applications in medical image analysis. It begins with an introduction to the speaker and their background in biomedical image analysis. It then discusses machine learning and how deep learning uses neural networks with many layers to automatically determine useful features from data. Convolutional neural networks are described as being well-suited for image analysis. Several examples of deep learning applications in medical images are given, including brain MRI segmentation, detection of prostate cancer in ultrasound images, and the speaker's own work on neonatal brain injury assessment from MRI scans. Resources for getting started with deep learning are also listed.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Pattern recognition and Machine Learning.Rohit Kumar
Machine learning involves using examples to generate a program or model that can classify new examples. It is useful for tasks like recognizing patterns, generating patterns, and predicting outcomes. Some common applications of machine learning include optical character recognition, biometrics, medical diagnosis, and information retrieval. The goal of machine learning is to build models that can recognize patterns in data and make predictions.
This document discusses pattern recognition. It defines a pattern as a set of measurements describing a physical object and a pattern class as a set of patterns sharing common attributes. Pattern recognition involves relating perceived patterns to previously perceived patterns to classify them. The goals are to put patterns into categories and learn to distinguish patterns of interest. Examples of pattern recognition applications include optical character recognition, biometrics, medical diagnosis, and military target recognition. Common approaches to pattern recognition are statistical, neural networks, and structural. The process involves data acquisition, pre-processing, feature extraction, classification, and post-processing. An example of classifying fish into salmon and sea bass is provided.
Similar to An introduction to Machine Learning (20)
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
1. An introduction to
Machine Learning
Pierre Geurts
p.geurts@ulg.ac.be
Department of EE and CS &
GIGA-R, Bioinformatics and modelling
University of Liège
3. Machine Learning: definition
Machine Learning is concerned with the development,
the analysis, and the application of algorithms that
allow computers to learn
Learning:
A computer learns if it improves its performance at
some task with experience (i.e. by collecting data)
Extracting a model of a system from the sole
observation (or the simulation) of this system in some
situations.
A model = some relationships between the variables
used to describe the system.
Two main goals: make prediction and better
understand the system
3
4. Machine learning: when ?
Learning is useful when:
Human expertise does not exist (navigating on Mars),
Humans are unable to explain their expertise (speech
recognition)
Solution changes in time (routing on a computer
network)
Solution needs to be adapted to particular cases (user
biometrics)
Example: It is easier to write a program that learns to
play checkers or backgammon well by self-play rather
than converting the expertise of a master player to a
program.
4
5. Applications: autonomous driving
DARPA Grand challenge 2005: build a robot capable
of navigating 175 miles through desert terrain in less
than 10 hours, with no human intervention
The actual wining time of Stanley [Thrun et al., 05]
was 6 hours 54 minutes.
http://www.darpa.mil/grandchallenge/ 5
6. Applications: recommendation system
Netflix prize: predict how much someone is going to love a
movie based on their movies preferences
Data: over 100 million ratings that over 480,000 users
gave to nearly 18,000 movies
Reward: $1,000,000 dollars if 10% improvement with
respect to Netflix's current system
http://www.netflixprize.com 6
8. Applications
Machine learning has a wide spectrum of applications
including:
Retail: Market basket analysis, Customer relationship
management (CRM)
Finance: Credit scoring, fraud detection
Manufacturing: Optimization, troubleshooting
Medicine: Medical diagnosis
Telecommunications: Quality of service optimization,
routing
Bioinformatics: Motifs, alignment
Web mining: Search engines
...
8
9. Related fields
Artificial Intelligence: smart algorithms
Statistics: inference from a sample
Computer Science: efficient algorithms and complex
models
Systems and control: analysis, modeling, and control
of dynamical systems
Data Mining: searching through large volumes of data
9
10. Problem definition
One part of the data
Data generation mining process
Raw data
Each step generates many questions:
Data generation: data types, sample
Preprocessing size, online/offline...
Preprocessing: normalization,
missing values, feature
Preprocessed data
selection/extraction...
Machine learning: hypothesis,
Machine choice of learning
learning paradigm/algorithm...
Hypothesis validation: cross-
Hypothesis validation, model deployment...
Validation
Knowledge/Predictive model
10
12. Outline
● Introduction
● Supervised Learning
Introduction
Model selection, cross-validation, overfitting
Some supervised learning algorithms
Beyond classification and regression
● Other learning protocols/frameworks
12
13. Supervised learning
Inputs Output
A1 A2 A3 A4 Y Supervised
-0.69 -0.72 Y 0.47 Healthy learning
-2.3 -1.2 N 0.15 Disease
0.32
0.37
-0.9
-1
N
Y
-0.76
-0.59
Healthy
Disease Ŷ = h(A1,A2,A3,A4)
-0.67 -0.53 N 0.33 Healthy
0.51 -0.09 Y -0.05 Disease
model,
Learning sample hypothesis
Goal: from the database (learning sample), find a
function h of the inputs that approximates at best the
output
Symbolic output ⇒ classification problem,
Numerical output ⇒ regression problem 13
14. Two main goals
Predictive:
Make predictions for a new sample described by its
attributes
A1 A2 A3 A4 Y
0.83 -0.54 T 0.68 Healthy
-2.3 -1.2 F -0.83 Disease
0.08 0.63 F 0.76 Healthy
0.06 -0.29 T -0.57 Disease
-0.98 -0.18 F -0.38 Healthy
-0.68 0.82 T -0.95 Disease
0.92 -0.33 F -0.48 ?
Informative:
Help to understand the relationship between the inputs
and the output
Y=disease if A3=F and A2<0.3
Find the most relevant inputs
14
15. Example of applications
Biomedical domain: medical diagnosis, differentiation
of diseases, prediction of the response to a
treatment...
Gene expression, Metabolite concentrations...
A1 A2 ... A4 Y
-0.61 0.23 ... 0.49 Healthy
-2.3 -1.2 ... -0.11 Disease
Patients -0.82 -0.41 ... 0.24 Healthy
-0.74 -0.1 ... -0.15 Disease
-0.14 0.98 ... -0.13 Healthy
-0.37 0.27 ... -0.67 Disease
15
16. Example of applications
Perceptual tasks: handwritten character recognition,
speech recognition...
Inputs:
● a grey intensity [0,255] for
each pixel
● each image is
represented by a vector of
pixel intensities
● eg.: 32x32=1024
dimensions
Output:
● 9 discrete values
● Y={0,1,2,...,9}
16
17. Example of applications
Time series prediction: predicting electricity load,
network usage, stock market prices...
17
18. Outline
● Introduction
● Supervised Learning
Introduction
Model selection, cross-validation, overfitting
Some supervised learning algorithms
Beyond classification and regression
● Other learning protocols/frameworks
18
19. Illustrative problem
Medical diagnosis from two measurements (eg., weights
and temperature)
1
M1 M2 Y
0.52 0.18 Healthy
0.44 0.29 Disease
0.89 0.88 Healthy
M2
0.99 0.37 Disease
... ... ...
0.95 0.47 Disease
0.29 0.09 Healthy
0
0 1
M1
Goal: find a model that classifies at best new cases for
which M1 and M2 are known
19
20. Learning algorithm
A learning algorithm is defined by:
a family of candidate models (=hypothesis space H)
a quality measure for a model
an optimization strategy
It takes as input a learning sample and outputs a
function h in H of maximum quality
1
a model
obtained by
supervised
G2
learning
0
0 1
G1
20
21. Linear model
Disease if w0+w1*M1+w2*M2>0
h(M1,M2)=
Normal otherwise
1
M2
0
0 1
M1
Learning phase: from the learning sample, find the
best values for w0, w1 and w2
Many alternatives even for this simple model (LDA,
Perceptron, SVM...)
21
22. Quadratic model
Disease if w0+w1*M1+w2*M2+w3*M12+w4*M22>0
h(M1,M2)=
Normal otherwise
1
M2
0
0 1
M1
Learning phase: from the learning sample, find the
best values for w0, w1,w2, w3 and w4
Many alternatives even for this simple model (LDA,
Perceptron, SVM...) 22
23. Artificial neural network
Disease if some very complex function of M1,M2>0
h(M1,M2)=
Normal otherwise
1
M2
0
0 1
M1
Learning phase: from the learning sample, find the
numerous parameters of the very complex function
23
24. Which model is the best?
linear quadratic neural net
1 1 1
0 0 0
0 1 0 1 0 1
Why not choose the model that minimises the error
rate on the learning sample? (also called re-
substitution error)
How well are you going to predict future data drawn
from the same distribution? (generalisation error)
24
25. The test set method
1.Randomly choose 30% of the data
1
to be in a test sample
2.The remainder is a learning
sample
3.Learn the model from the learning
0
0 1
sample
4.Estimate its future performance on
the test sample
25
26. Which model is the best?
linear quadratic neural net
1 1 1
0 0 0
0 1 0 1 0 1
LS error= 3.4% LS error= 1.0% LS error= 0%
TS error= 3.5% TS error= 1.5% TS error= 3.5%
We say that the neural network overfits the data
Overfitting occurs when the learning algorithm starts
fitting noise.
(by opposition, the linear model underfits the data)
26
27. The test set method
Upside:
very simple
Computationally efficient
Downside:
Wastes data: we get an estimate of the best method to
apply to 30% less data
Very unstable when the database is small (the test
sample choice might just be lucky or unlucky)
27
28. Leave-one-out Cross Validation
For k=1 to N
remove the kth object from the
1
learning sample
learn the model on the remaining
objects
apply the model to get a prediction
0
0 1
for the kth object
report the proportion of
missclassified objects
28
29. Leave-one-out Cross Validation
Upside:
Does not waste the data (you get an estimate of the
best method to apply to N-1 data)
Downside:
Expensive (need to train N models)
High variance
29
30. k-fold Cross Validation
Randomly partition the dataset into k subsets (for example
10)
TS
For each subset:
learn the model on the objects that are not in the subset
compute the error rate on the points in the subset
Report the mean error rate over the k subsets
When k=the number of objects ⇒ leave-one-out cross
validation
30
31. Which kind of Cross Validation?
Test set:
Cheap but waste data and unreliable when few data
Leave-one-out:
Doesn't waste data but expensive
k-fold cross validation:
compromise between the two
Rule of thumb:
a lot of data (>1000): test set validation
small data (100-1000): 10-fold CV
very small data(<100): leave-one-out CV
31
33. Complexity
Controlling complexity is called regularization or
smooting
Complexity can be controlled in several ways
The size of the hypothesis space: number of candidate
models, range of the parameters...
The performance criterion: learning set performance
versus parameter range, eg. minimizes
Err(LS)+λ C(model)
The optimization algorithms: number of iterations,
nature of the optimization problem (one global optimum
versus several local optima)...
33
34. CV-based algorithm choice
Step 1: compute 10-fold (or test set or LOO) CV error
for different algorithms
Algo 4
Algo 2
CV Algo 1 Algo 3
error
Step 2: whichever algorithm gave best CV score: learn
a new model with all data, and that's the predictive
model
What is the expected error rate of this model?
34
35. Warning: Intensive use of CV can overfit
If you compare many (complex) models, the
probability that you will find a good one by chance on
your data increases
Solution:
Hold out an additional test set before starting the
analysis (or, better, generate this data afterwards)
Use it to estimate the performance of your final
model
(For small datasets: use two stages of 10-fold CV)
35
36. A note on performance measures
True class Model 1 Model 2
1
2
Negative
Negative
Positive
Negative
Negative
Negative
Which of these two models
3 Negative Positive Positive is the best?
4 Negative Positive Negative
5
6
Negative
Negative
Negative
Negative
Negative
Negative
The choice of an error or
7
8
Negative
Negative
Negative
Negative
Positive
Negative
quality measure is highly
9 Negative Negative Negative application dependent.
10 Positive Positive Positive
11 Positive Positive Negative
12 Positive Positive Positive
13 Positive Positive Positive
14 Positive Negative Negative
15 Positive Positive Negative
36
37. A note on performance measures
The error rate is not the only way to assess a predictive
model
In binary classification, results can be summarized in a
contingency table (aka confusion matrix)
Predicted class
Actual class p n Total
p True Positive False Negative P
n False Positive True Negative N
Various criterion
Error rate = (FP+FN)/(N+P) Sensitivity = TP/P (aka recall)
Accuracy = (TP+TN)/(N+P) Specificity = TN/(TN+FP)
= 1-Error rate
Precision = TP/(TP+FP) (aka PPV)
37
38. ROC and Precision/recall curves
Each point corresponds to a particular choice of the
decision threshold
True Positive Rate
(Sensitivity)
Precision
False Positive Rate Recall
(1-Specificity) (Sensitivity)
38
39. Outline
Introduction
Model selection, cross-validation, overfitting
Some supervised learning algorithms
k-NN
Linear methods
Artificial neural networks
Support vector machines
Decision trees
Ensemble methods
Beyond classification and regression
39
40. Comparison of learning algorithms
Three main criteria:
Accuracy:
Measured by the generalization error (estimated by CV)
Efficiency:
Computing times and scalability for learning and testing
Interpretability:
Comprehension brought by the model about the input-
output relationship
Unfortunately, there is usually a tradeoff between
these criteria
40
41. 1-Nearest Neighbor (1-NN)
(prototype based method, instance based learning, non-parametric
method)
One of the simplest learning algorithm:
outputs as a prediction the output associated to the
sample which is the closest to the test object
M1 M2 Y
1 0.32 0.81 Healthy
2 0.15 0.38 Disease
3 0.39 0.34 Healthy
4 0.62 0.11 Disease
5 0.92 0.43 ?
?
2 2
d 5,1 = 0. 32−0 . 92 0 . 81−0 . 43 =0 . 71
d 5,2 = 0. 15− 0 .92 0 . 38−0 . 43 =0 . 77
2 2
d 5,3 = 0. 39−0 . 92 0 . 34− 0. 43 =0 . 71
2 2
d 5,4 = 0 .62−0 . 92 0 . 43−0 . 43 = 0 . 44
2 2
closest=usually of minimal Euclidian distance
41
42. Obvious extension: k-NN
?
Find the k nearest neighbors (instead of only the first
one) with respect to Euclidian distance
Output the most frequent class (classification) or the
average outputs (regression) among the k neighbors.
42
43. Effect of k on the error
Error
Under-fitting Over-fitting
CV error
LS error
Optimal k k
43
44. Small exercise
In this classification problem
with two inputs:
What it the resubstitution
error (LS error) of 1-NN?
What is the LOO error of 1-
NN?
What is the LOO error of 3-
NN?
What is the LOO error of
22-NN?
Andrew Moore
44
45. k-NN
Advantages:
very simple
can be adapted to any data type by changing the
distance measure
Drawbacks:
choosing a good distance measure is a hard problem
very sensitive to the presence of noisy variables
slow for testing
45
46. Linear methods
Find a model which is a linear combinations of the inputs
Regression: y=w 0 w 1 x 1 w 2 x 2 ...w n w n
Classification: y=c 1 if w 0 w1 x1 ...w n x n 0 , c 2 otherwise
Several methods exist to find coefficients w0,w1... corresponding
to different objective functions, optimization algorithms, eg.:
Regression: least-square regression, ridge regression, partial
least square, support vector regression, LASSO...
Classification: linear discriminant analysis, PLS-discriminant 46
analysis, support vector machines...
47. Example: ridge regression
Find w that minimizes (λ>0):
∑i y i −w x i ∥w∥
2 2
From simple algebra, the solution is given by:
r T −1 T
w = X X I X y
where X is the input matrix and y is the output vector
λ regulates complexity (and avoids problems related to
the singularity of XTX)
47
48. Example: perceptron
Find w that minimizes:
∑i y i −w x i 2
using gradient descent: given a training example x , y
y−wT x
∀ j w j w j x j
Online algorithm, ie. that treats every example in turn
(vs Batch algorithm that treats all examples at once)
Complexity is regulated by the learning rate η and the
number of iterations
Can be adapted to classification
48
49. Linear methods
Advantages:
simple
there exist fast and scalable variants
provide interpretable models through variable weights
(magnitude and sign)
Drawbacks:
often not as accurate as other (non-linear) methods
49
50. Non-linear extensions
Generalization of linear methods:
y=w 0 w1 1 xw 2 2 x 2 ...w n n x
Any linear methods can be applied (but regularization
becomes more important)
Artificial neural networks (with a single hidden layer):
y= g ∑ W j g ∑ w i , j x i
j i
where g is a non linear function (eg. sigmoid)
(a non linear function of a linear combination of non
linear functions of linear combinations of inputs)
Kernel methods:
y=∑ w i i x ⇔ y=∑ j k x j , x
i j
where k x , x ' =〈 x , x ' 〉 is the dot-product in the
feature space and j indexes training examples 50
51. Artificial neural networks
Supervised learning method initially inspired by the
behaviour of the human brain
Consists of the inter-connection of several small units
Essentially numerical but can handle classification and
discrete inputs with appropriate coding
Introduced in the late 50s, very popular in the 90s
51
52. Hypothesis space: a single neuron
1 A2
A1 x w0
x w1 +
A2 x w2
+ tanh Y _
… A1
AN x wN
tanh
Y=tanh(w1*A1+w2*A2+…+wN*AN+w0) +1
-1
52
53. Hypothesis space:
Multi-layers Perceptron
Inter-connection of several neurons (just like in the human
brain)
Hidden layer
Input layer
Output layer
With a sufficient number of neurons and a sufficient
number of layers, a neural network can model any function
of the inputs. 53
54. Learning
Choose a structure
Tune the value of the parameters (connections
between neurons) so as to minimize the learning
sample error.
Non-linear optimization by the back-propagation
algorithm. In practice, quite slow.
Repeat for different structures
Select the structure that minimizes CV error
54
56. Artificial neural networks
Advantages:
Universal approximators
May be very accurate (if the method is well used)
Drawbacks:
The learning phase may be very slow
Black-box models, very difficult to interprete
Scalability
56
57. Support vector machines
Recent (mid-90's) and very successful method
Based on two smart ideas:
large margin classifier
kernelized input space
57
59. Margin of a linear classifier
The margin = the
width that the
boundary could be
increased by before
hitting a datapoint.
59
60. Maximum-margin linear classifier
The linear classifier
with the maximum
margin (= Linear
SVM)
Intuitively, this feels
safest
Works very well in
practice
Support vectors: the
samples the closest to the
hyperplane 60
61. Mathematically
Linearly separable case: amount at solving the
following quadratic programming optimization problem:
1 2
minimizes ∥w∥
2
T
subject to y i w x i −b1, ∀ i=1,... , N
Decision function:
T
y=1 if w x−b0, y =−1 otherwise
Non linearly separable case:
1
∥w∥ C ∑ i
2
minimizes
2 i
T
subject to y i w x i −b1−i , ∀ i=1,... , N
61
62. Non-linear boundary
– What about this problem?
x1 x12
φ
x2 x22
Solution:
map the data into a new feature space where the
boundary is linear
Find the maximum margin model in this new space
62
63. The kernel trick
Intuitively:
You don't need to compute explicitly the mapping φ
All you need is a (special) similarity measure between
objects (like for the kNN)
This similarity measure is called a kernel
Mathematically:
The maximum-margin classifier in some feature space
can be written only in terms of dot-products in that
feature space:
k(x,x')=<φ(x),φ(x')>
63
64. Mathematically
Primal form of the optimization problem:
1 2
minimizes ∥w∥
2
subject to y i 〈 w , xi 〉−b1, ∀ i=1,... , N
Dual form:
1
minimizes ∑ i − ∑ i j yi y j 〈 xi , x j 〉
i 2 i, j
subject to i 0 and ∑ i y i =0
i
w=∑ i y i x i
i
Decision function:
y=1 if 〈 w , x 〉=∑ i y i 〈 x i , x〉=∑ i y i k x i , x0
i i
y=−1 otherwise 64
66. Examples of kernels
Linear kernel:
k(x,x')= <x,x'>
Polynomial kernel
k(x,x')=(<x,x'>+1)d
(main parameter: d, the maximum degree)
Radial basis function kernel:
k(x,x')=exp(-||x-x'||2/(22))
(main parameter: , the spread of the distribution)
● + many kernels that have been defined for structured
data types (eg. texts, graphs, trees, images)
66
67. Feature ranking with linear kernel
With a linear kernel, the model looks like:
C1 if w0+w1*x1+w2*x2+...+wK*xK>0
h(x1,x2,...,xK)=
C2 otherwise
Most important variables are those corresponding to
large |wi|
100
|w|
75
50
25
0
variables
67
68. SVM parameters
Mainly two sets of parameters in SVM:
Optimization algorithm's parameters:
Control the number of training errors versus the margin
(when the learning sample is not linearly separable)
Kernel's parameters:
choice of particular kernel
given this choice, usually one complexity parameter
eg, the degree of the polynomial kernel
Again, these parameters can be determined by cross-
validation
68
69. Support vector machines
Advantages:
State-of-the-art accuracy on many problems
Can handle any data types by changing the kernel
(many applications on sequences, texts, graphs...)
Drawbacks:
Tuning the parameter is very crucial to get good results
and somewhat tricky
Black-box models, not easy to interprete
69
70. A note on kernel methods
The kernel trick can be applied to any (learning)
algorithm whose solution can be expressed in terms of
dot-products in the original input space
It makes a non-linear algorithm from a linear one
Can work in a very highly dimensional space (even
infinite) without requiring to explicitly compute the
features
Decouple the representation stage from the learning
stage. The same learning machine can be applied to a
large range of problems
Examples: ridge regression, perceptron, PCA, k-
means...
70
71. Decision (classification) trees
A learning algorithm that can handle:
Classification problems (binary or multi-valued)
Attributes may be discrete (binary or multi-valued) or
continuous.
Classification trees were invented at least twice:
By statisticians: CART (Breiman et al.)
By the AI community: ID3, C4.5 (Quinlan et al.)
71
72. Decision trees
A decision tree is a tree where:
Each interior node tests an attribute
Each branch corresponds to an attribute value
Each leaf node is labeled with a class
A1
a13
a11
a12
A2 A3
c1
a21 a22 a31 a32
c1 c2 c2 c1
72
73. A simple database: playtennis
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild Normal Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool High Strong Yes
D8 Sunny Mild Normal Weak No
D9 Sunny Hot Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Cool Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
73
74. A decision tree for playtennis
Outlook
Sunny Rain
Overcast
Humidity Wind
yes
High Normal Strong Weak
no yes no yes
Should we play tennis on D15?
Day Outlook Temperature Humidity Wind Play Tennis
D15 Sunny Hot High Weak ?
74
75. Top-down induction of DTs
Choose « best » attribute
Split the learning sample
Proceed recursively until each object is correctly classified
Outlook
Rain
Sunny
Overcast
Day Outlook Temp. Humidity Wind Play Day Outlook Temp. Humidity Wind Play
D1 Sunny Hot High Weak No D4 Rain Mild Normal Weak Yes
D2 Sunny Hot High Strong No D5 Rain Cool Normal Weak Yes
D8 Sunny Mild High Weak No D6 Rain Cool Normal Strong No
D9 Sunny Hot Normal Weak Day Yes Outlook Temp. Humidity Wind Play
D10 Rain Mild Normal Strong Yes
D11 Sunny Cool Normal Strong D3 Yes Overcast Hot High Weak Yes
D14 Rain Mild High Strong No
D7 Overcast Cool High Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
75
76. Top-down induction of DTs
Procedure learn_dt(learning sample, LS)
If all objects from LS have the same class
Create a leaf with that class
Else
Find the « best » splitting attribute A
Create a test node for this attribute
For each value a of A
Build LSa= {o LS | A(o) is a}
Use Learn_dt(LSa) to grow a subtree from LSa.
76
77. Which attribute is best ?
A1=? [29+,35-] A2=? [29+,35-]
T F T F
[21+,5-] [8+,30-] [18+,33-] [11+,2-]
A “score” measure is defined to evaluate splits
This score should favor class separation at each step (to
shorten the tree depth)
Common score measures are based on information theory
I (LS, A) H( LS) | LS left | H(LS left ) | LSright | H(LS right)
| LS | | LS |
77
78. Effect of number of nodes on error
Error
Under-fitting Over-fitting
CV error
LS error
Optimal complexity Nb nodes
78
79. How can we avoid overfitting?
Pre-pruning: stop growing the tree earlier, before it
reaches the point where it perfectly classifies the
learning sample
Post-pruning: allow the tree to overfit and then post-
prune the tree
Ensemble methods (later)
79
80. Post-pruning
Error
Under-fitting Over-fitting
CV error
2. Tree pruning
1. Tree growing LS error
Optimal complexity Nb nodes
80
81. Numerical variables
Example: temperature as a number instead of a
discrete value
Two solutions:
Pre-discretize: Cold if Temperature<70, Mild between 70 and
75, Hot if Temperature>75
Discretize during tree growing:
Temperature
65.4 >65.4
no yes
optimization of the threshold to maximize the score
81
83. Regression trees
Trees for regression problems: exactly the same model but
with a number in each leaf instead of a class
Outlook
Rain
Sunny
Overcast
Humidity Wind
45.6
High Normal Strong Weak
22.3 Temperature 64.4 7.4
<71 >71
1.2 3.4
83
84. Interpretability and attribute selection
Interpretability
Intrinsically, a decision tree is highly interpretable
A tree may be converted into a set of “if…then”
rules.
Attribute selection
If some attributes are not useful for classification,
they will not be selected in the (pruned) tree
Of practical importance, if measuring the value of a
variable is costly (e.g. medical diagnosis)
Decision trees are often used as a pre-processing for
other learning algorithms that suffer more when
there are irrelevant variables
84
85. Attribute importance
In many applications, all variables do not contribute
equally in predicting the output.
We can evaluate variable importances with trees
Outlook
Humidity
Wind
Temperature
85
86. Decision and regression trees
Advantages:
very fast and scalable method (able to handle a very
large number of inputs and objects)
provide directly interpretable models and give an idea of
the relevance of attributes
Drawbacks:
high variance (more on this later)
often not as accurate as other methods
86
87. Ensemble methods
...
Sick Healthy
... Sick
Sick
Combine the predictions of several models built with a learning
algorithm. Often improve very much accuracy.
Often used in combination with decision trees for efficiency
reasons
Examples of algorithms: Bagging, Random Forests, Boosting...
87
88. Bagging: motivation
Different learning samples yield different models,
especially when the learning algorithm overfits the data
1 1
0 0
0 1 0 1
As there is only one optimal model, this variance is source
of error
Solution: aggregate several models to obtain a more
stable one
1
88
0
0 1
89. Bagging: bootstrap aggregating
Boostrap sampling
(sampling with replacement) 0
...
0 0 10
...
Sick Healthy
... Sick
Sick
Note: the more models, the better. 89
91. Boosting
Idea of boosting: combine many « weak » models to
produce a more powerful one.
Weak model = a model that underfits the data (strictly,
in classification, a model slightly better than random
guessing)
Adaboost:
At each step, adaboost forces the learning algorithm to focus
on the cases from the learning sample misclassified by the
last model
The predictions of the models are combined through a
weighted vote. More accurate models have more weights in
the vote.
Eg., by duplicating
the missclassified
examples in the
learning sample 91
93. Interpretability and efficiency
When combined with decision trees, ensemble methods loose
interpretability and efficiency
However,
We still can use the ensemble to compute the importance of
variables (by averaging it over all trees)
100
75
50
25
0
Ensemble methods can be parallelized and boosting type
algorithm uses smaller trees. So, the increase of computing
times is not so detrimental. 93
94. Example on microarray data
72 patients, 7129 gene expressions, 2 classes of Leukemia
(ALL and AML) (Golub et al., Science, 1999)
Leave-one-out error with several variants
Method Error
1 decision tree 22.2% (16/72)
Random forests (k=85,T=500) 9.7% (7/72)
Extra-trees (sth=0.5, T=500) 5.5% (4/72)
Adaboost (1 test node, T=500) 1.4% (1/72)
Variable importance with boosting
100
Importance
75
50
25
0
94
variables
95. Method comparison
Method Accuracy Efficiency Interpretability Ease of use
kNN ++ + + ++
DT + +++ +++ +++
Linear ++ +++ ++ +++
Ensemble +++ +++ ++ +++
ANN +++ + + ++
SVM ++++ + + +
Note:
The relative importance of the criteria depends on
the specific application
These are only general trends. Eg., in terms of
accuracy, no algorithm is always better than all 95
others.
96. Outline
● Introduction
● Supervised Learning
Introduction
Model selection, cross-validation, overfitting
Some supervised learning algorithms
Beyond classification and regression
● Other learning protocols/frameworks
96
97. Beyond classification and regression
All supervised learning problems can not be turned
into standard classification or regression problems
Examples:
Graph predictions
Sequence labeling
image segmentation
97
98. Structured output approaches
Decomposition:
Reduce the problem to several simpler classification or
regression problems by decomposing the output
Not always possible and does not take into account
interactions between suboutputs
Kernel output methods
Extend regression methods to handle an output space
endowed with a kernel
This can be done with regression trees or ridge regression
for example
Large margin methods
Use SVM-based approaches to learn a model that scores
directly input-output pairs:
y=arg max y ' ∑ w i i x , y ' 98
i
100. Labeled versus unlabeled data
Unlabeled data=input-output pairs without output value
In many settings, unlabeled data is cheap but labeled
data can be hard to get
labels may require human experts
human annotation is expensive, slow, unreliable
labels may require special devices
Examples:
Biomedical domain
Speech analysis
Natural language parsing
Image categorization/segmentation
Network measurement 100
101. Semi-supervised learning
Goal: exploit both labeled and unlabeled data to build
better models than using each one alone
A1 A2 A3 A4 Y
0.01 0.37 T 0.54 Healthy
labeled data -2.3 -1.2 F 0.37 Disease
0.69 -0.78 F 0.63 Healthy
-0.56 -0.89 T -0.42
unlabeled data -0.85 0.62 F -0.05
-0.17 0.09 T 0.29
test data -0.09 0.3 F 0.17 ?
Why would it improve?
101
102. Some approaches
Self-training
Iteratively label some unlabeled examples with a model
learned from the previously labeled examples
Semi-supervised SVM (S3VM)
Enumerate all possible labeling of the unlabeled
examples
Learn an SVM for each labeling
Pick the one with the largest margin
102
103. Some approaches
Graph-based algorithms
Build a graph over the (labeled and unlabeled)
examples (from the inputs)
Learn a model that predicts well labeled examples and
is smooth over the graph
103
104. Transductive learning
Like supervised learning but we have access to the
test data from the beginning and we want to exploit it
We don't want a model, only compute predictions for
the unlabeled data
Simple solution:
Apply semi-supervised learning techniques using the
test data as unlabeled data to get a model
Use the resulting model to make predictions on the test
data
There exist also specific algorithms that avoid building
a model
104
105. Active learning
Goal:
given unlabeled data, find (adaptively) the examples to
label in order to learn an accurate model
The hope is to reduce the number of labeled instances
with respect to the standard batch SL
Usually, in an online setting:
choose the k “best” unlabeled examples
determine their labels
update the model and iterate
Algorithms differ in the way the unlabeled examples
are selected
Example: choose the k examples for which the model
predictions are the most uncertain
105
106. Reinforcement learning
Learning form interactions
a0 a1 a2
s0 s1 s2 ...
r0 r1 r2
Goal: learn to choose sequence of actions (= policy) that
maximizes 2
r 0 r 1 r 2 ... , where 01 106
107. RL approaches
System is usually modeled by
state transition probabilities P st 1∣s t , at
reward probabilities P r t1∣st , at
(= Markov Decision Process)
Model of the dynamics and reward is known try to
compute optimal policy by dynamic programming
Model is unknown
Model-based approaches first learn a model of the
dynamics and then derive an optimal policy from it (DP)
Model-free approaches learn directly a policy from
the observed system trajectories
107
108. Reinforcement versus supervised learning
Batch-mode SL: learn a mapping from input to output from
observed input-output pairs
Batch-mode RL: learn a mapping from state to action from
observed (state,action,reward) triplets
Online active learning: combine SL and (online) selection
of instances to label
Online RL: combine policy learning with control of the
system and generation of the training trajectories
Note:
RL would reduce to SL if the optimal action was known
in each state
SL is used inside RL to model system dynamics and/or
value functions 108
109. Examples of applications
Robocup Soccer Teams (Stone & Veloso, Riedmiller et al.)
Inventory Management (Van Roy, Bertsekas, Lee
&Tsitsiklis)
Dynamic Channel Assignment, Routing (Singh &
Bertsekas, Nie & Haykin, Boyan & Littman)
Elevator Control (Crites & Barto)
Many Robots: navigation, bi-pedal walking, grasping,
switching between skills...
Games: TD-Gammon and Jellyfish (Tesauro, Dahl)
109
111. Unsupervised learning methods
Many families of problems exist, among which:
Clustering: try to find natural groups of
samples/variables
eg: k-means, hierarchical clustering
Dimensionality reduction: project the data from a high-
dimensional space down to a small number of
dimensions
eg: principal/independent component analysis, MDS
Density estimation: determine the distribution of data
within the input space
eg: bayesian networks, mixture models.
111
112. Clustering
Goal: grouping a collection of objects into subsets or
“clusters”, such that those within each cluster are more
closely related to one another than objects assigned to
different clusters
112
113. Clustering
variables
Clustering rows
grouping similar objects
Clustering columns
grouping similar variables
objects
across samples
Bi-Clustering/Two-way
clustering
grouping objects that are
Bi-cluster
Cluster of
objects
similar across a subset of
Cluster of variables
variables
113
114. Clustering
Two essential components of cluster analysis:
Distance measure: A notion of distance or similarity of
two objects: When are two objects close to each other?
Cluster algorithm: A procedure to minimize distances
of objects within groups and/or maximize distances
between groups
114
115. Examples of distance measures
Euclidean distance measures
average difference across coordinates
Manhattan distance measures
average difference across
coordinates, in a robust way
Correlation distance measures
difference with respect to trends
115
116. Clustering algorithms
Popular algorithms for clustering
hierarchical clustering
K-means
SOMs (Self-Organizing Maps)
autoclass, mixture models...
Hierarchical clustering allows the choice of the
dissimilarity matrix.
k-Means and SOMs take original data directly as input.
Attributes are assumed to live in Euclidean space.
116
117. Hierarchical clustering
Agglomerative clustering:
1. Each object is assigned to its own cluster
2. Iteratively:
the two most similar clusters are joined and
replaced by a new one
the distance matrix is updated with this new cluster
replacing the two joined clusters
(divisive clustering would start from a big cluster)
117
118. Distance between two clusters
Single linkage uses the smallest distance
Complete linkage uses the largest distance
Average linkage uses the average distance
118
120. Dendrogram
Hierarchical clustering are visualized through dendrograms
Clusters that are joined are combined by a line
Height of line is distance between clusters
Can be used to determine visually the number of
clusters
120
121. Hierarchical clustering
Strengths
No need to assume any particular number of clusters
Can use any distance matrix
Find sometimes a meaningful taxonomy
Limitations
Find a taxonomy even if it does not exist
Once a decision is made to combine two clusters it
cannot be undone
Not well theoretically motivated
121
122. k-Means clustering
Partitioning algorithm with a prefixed number k of
clusters
Use Euclidean distance between objects
Try to minimize the sum of intra-cluster variances
k
∑ ∑ d 2 o,c j
j= 1 o∈Cluster j
where cj is the center of cluster j and d2 is the Euclidean
distance
122
126. k-Means clustering
Strengths
Simple, understandable
Can cluster any new point (unlike hierarchical
clustering)
Well motivated theoretically
Limitations
Must fix the number of clusters beforehand
Sensitive to the initial choice of cluster centers
Sensitive to outliers
126
127. Suboptimal clustering
You could obtain any of these from a random start of
k-means
Solution: restart the algorithm several times
127
128. Principal Component Analysis
An exploratory technique used to reduce the
dimensionality of the data set to a smaller space (2D,
3D)
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
PC1 PC2
0.25 0.93 0.04 -0.78 -0.53 0.57 0.19 0.29 0.37 -0.22 0.36 0.1
-2.3 -1.2 -4.5 -0.51 -0.76 0.07 0.81 0.95 0.99 0.26 -2.3 -1.2
-0.29 -1 0.73 -0.33 0.52 0.13 0.13 0.53 -0.5 -0.48 0.27 -0.89
-0.16 -0.17 -0.26 0.32 -0.08 -0.38 -0.48 0.99 -0.95 0.34 -0.19 0.7
0.07 -0.87 0.39 0.5 -0.63 -0.53 0.79 0.88 0.74 -0.14 -0.77 -0.7
0.61 0.15 0.68 -0.94 0.5 0.06 -0.56 0.49 0 -0.77 -0.65 -0.99
Transform some large number of variables into a
smaller number of uncorrelated variables called
principal components (PCs)
128
129. Objectives of PCA
Reduce dimensionality (pre-processing for other
methods)
Choose the most useful (informative) variables
Compress the data
Visualize multidimensional data
to identify groups of objects
to identify outliers
129
130. Basic idea
Goal: map data points into a few dimensions while
trying to preserve the variance of the data as much as
possible
First
component
Second
component
130
131. Each component is a linear combination of
the original variables
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 PC1 PC2
-0.39 -0.38 0.29 0.65 0.15 0.73 -0.57 0.91 -0.89 -0.17 0.62 -0.33
-2.3 -1.2 -4.5 -0.15 0.86 -0.85 0.43 -0.19 -0.83 -0.4 -2.3 -1.2
0.9 0.4 -0.11 0.62 0.94 0.97 0.1 -0.41 0.01 0.1 0.88 0.31
-0.82 -0.31 0.14 0.22 -0.49 -0.76 0.27 0 -0.43 -0.81 -0.18 -0.05
0.71 0.39 -0.09 0.26 -0.46 -0.05 0.46 0.39 -0.01 0.64 -0.39 -0.01
-0.25 0.27 -0.81 -0.42 0.62 0.54 -0.67 -0.15 -0.46 0.69 -0.61 0.53
Scores for
each sample
and PC
PC1=0.2*A1+3.4*A2-4.5*A3 VAR(PC1)=4.5 45%
PC2=0.4*A4+5.6*A5+2.3*A7 VAR(PC2)=3.3 33%
... ...
Loading of a variable For each component, we
Gives an idea of its importance in
have a measure of the
the component percentage of the variance
Can be use for selecting
of the initial data that it
biomarkers contains
131
132. Books
Reference textbooks for the course
The elements of statistical learning: data mining, inference, and
prediction. T. Hastie et al, Springer, 2001
Pattern Recognition and Machine Learning (Information Science and
Statistics). C.M.Bishop, Springer, 2004
Other textbooks
Pattern classification (2nd edition). R.Duda, P.Hart, D.Stork, Wiley
Interscience, 2000
Introduction to Machine Learning. Ethan Alpaydin, MIT Press, 2004.
Machine Learning. Tom Mitchell, McGraw Hill, 1997.
132
133. Books
More advanced topics
kernel methods for pattern analysis. J. Shawe-Taylor and N. Cristianini.
Cambridge University Press, 2004
Reinforcement Learning: An Introduction. R.S. Sutton and A.G. Barto.
MIT Press, 1998
Neuro-Dynamic Programming. D.P Bertsekas and J.N. Tsitsiklis. Athena
Scientific, 1996
Semi-supervised learning. Chapelle et al., MIT Press, 2006
Predicting structured data. G. Bakir et al., MIT Press, 2007
133
134. Softwares
Pepito
www.pepite.be
Free for academic research and education
WEKA
http://www.cs.waikato.ac.nz/ml/weka/
Many R and Matlab packages
http://www.kyb.mpg.de/bs/people/spider/
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
134
135. Journals
Journal of Machine Learning Research
Machine Learning
IEEE Transactions on Pattern Analysis and Machine
Intelligence
Journal of Artificial Intelligence Research
Neural computation
Annals of Statistics
IEEE Transactions on Neural Networks
Data Mining and Knowledge Discovery
...
135
136. Conferences
International Conference on Machine Learning (ICML)
European Conference on Machine Learning (ECML)
Neural Information Processing Systems (NIPS)
Uncertainty in Artificial Intelligence (UAI)
International Joint Conference on Artificial Intelligence
(IJCAI)
International Conference on Artificial Neural Networks
(ICANN)
Computational Learning Theory (COLT)
Knowledge Discovery and Data mining (KDD)
...
136