1syllabus Machine Learning and Data Mining 2015
1syllabus Machine Learning and Data Mining 2015
Syllabus for the course «Machine Learning and Data Mining», Master of Science
Authors:
Maxim A. Borisyak, Master of sciences (Msc), lecturer, mborisyak@hse.ru
Approved by: Head of Data Analysis and Artificial Intelligence Department, Sergey O. Kuznetsov
Recommended by:
Moscow, 2015
National Research University Higher School of Economics
Syllabus for the course «Machine Learning and Data Mining», Master of Science
1. Teachers
Author, lecturer: Borisyak Maxim, National Research University Higher School of
Economics, School of Data Analysis and Artificial Intelligence, lecturer.
2. Scope of Use
The present program establishes minimum demands of students’ knowledge and skills, and
determines content of the course.
The present syllabus is aimed at department teaching the course, their teaching assistants,
and students of the Master of Science program 010402 «Data Sciences»,
This syllabus meets the standards required by:
Educational standards of National Research University Higher School of Economics;
Educational program «Data Sciences» of Federal Master’s Degree Program 010402
«Applied Mathematics and Informatics», 2015;
University curriculum of the Master’s program in «Data Sciences» for 2015.
3. Summary
Machine Learning and mining of massive datasets are rapidly growing fields of data analysis. For
many years data analysis and statistical community has been developing algorithms and methods
for discovering patterns in datasets. Besides theoretical knowledge successful research in the areas
depends on confided usage of common methods, algorithms and tools along with skills for
developing new ones. The focus of the second course “Machine Learning and Data Mining” at
“Data Science” Master Program is to introduce students to methods and modern programming tools
and frameworks aimed for data analysis. Special attention is given to methods for handling massive
datasets. The course is constantly being adopted to match current state-of-the-art in the area.
4. Learning Objectives
The objectives of the course “Machine Learning and Data Mining” is to introduce
students to state-of-the-art methods and modern programming tools for data analysis.
5. Learning outcomes
After completing the study of the discipline “Machine Learning and Data Mining”, the
student are expected to:
understand complexity of Machine Learning algorithms and their limitations;
understand modern notions in data analysis oriented computing;
be capable of confidently applying common Machine Learning algorithms in
practice and implementing their own;
be capable of performing distributed computations;
be capable of performing experiments in Machine Learning using real-world data.
After completing the study of the discipline “Machine Learning and Data Mining” the
student should have the following competences:
National Research University Higher School of Economics
Syllabus for the course «Machine Learning and Data Mining», Master of Science
Educative forms
and methods
Descriptors (indicators
aimed at
Competence Code Code (UC) of achievement of the
generation and
result)
development of
the competence
The ability to reflect SC-1 SC-М1 The student is able to Lectures and
developed methods of reflect developed and tutorials, group
activity. implement methods for discussions,
machine learning and presentations,
data mining (data paper reviews.
sciences)
The ability to propose SC-2 SC-М2 The student is able to Classes, group
a model to invent and improve and develop projects.
test methods and tools methods and
of professional algorithms as
activity applicable to machine
learning and data
mining (data sciences)
Capability of SC-3 SC-М3 The student obtains Homework
development of new necessary knowledge scripts for DA
research methods, in methods for machine and ML.
change of scientific learning and data
and industrial profile mining, which is
of self-activities sufficient to develop
new methods
The ability to describe PC-5 IC- The student is able to Lectures and
problems and M5.3_5.4_5.6_2.4.1 describe computational tutorials, group
situations of data analysis problems discussions,
professional activity in in terms of presentations,
terms of humanitarian, computational paper reviews.
economic and social mathematics.
sciences to solve
problems which occur
across sciences, in
allied professional
fields.
The ability to detect, PC-8 SPC-M3 The student is able to Discussion of
transmit common identify algorithmic paper reviews;
goals in the aspects in machine cross discipline
professional and social learning and data lectures.
activities mining tasks, evaluate
correctness and Special guests
efficiency of the used from Laboratory
methods, and their of Machine
applicability in each Learning and
current situation Data Analysis
invited as key-
speakers.
National Research University Higher School of Economics
Syllabus for the course «Machine Learning and Data Mining», Master of Science
The course “Machine Learning and Data Mining” is a course taught in the second year of
the Master’s program 010402 “Data Sciences” and follows the course “Introduction to Machine
Learning and Data Mining”, also the base course for specialization “Intelligent Systems and
Structural Analysis”.
Prerequisites
The course is based on knowledge and understanding of
Algorithms and data structures
Theory of probability and statistical analysis
Machine Learning
The course also requires some programming experience in all of the languages:
Python
C or C++
Knowledge of Java or Scala programming languages is also a benefit.
7. Schedule
One pair consists of 1 academic hour for lecture and 1 academic hour for classes after lecture.
Characteristics
Type of work
1 2
Homework 10 Solving homework tasks and examples.
Type of Research project on real world Machine
Special homework –
grading Learning problem, presentation of the
research projects and 2
results, tools and techniques, used in the
reports
project.
Exam 1 Written exam
Final
9. Assessment
The assessment consists of classwork and homework, assigned after each lecture. Students
have to demonstrate confident usage of presented methods, tools, frameworks and techniques, be
able to solve example real world tasks.
Final assessment is the final exam. Students have to combine their theoretical knowledge
with practical skills in order to solve real world problems.
The grade formula:
The exam consists of 1 problem, giving 10 points total.
Final course mark is obtained from the following formula:
Оfinal = 0,4 * Оcumulative + 0,4 * OcumaltiveSpecial + 0,2 * Оexam.
where:
Ocumulative – cumulative mark for classwork and homework;
OcumulativeSpecial – cumulative mark for special homework;
Oexam – mark on the exam.
The grades are rounded in favour of examiner/lecturer with respect to regularity of class and
home works. All grades, having a fractional part greater than 0.5, are rounded up.
Ten-point
Five-point
Grading Scale
Grading Scale
1 - very bad
2 – bad Unsatisfactory - 2 FAIL
3 – no pass
4 – pass
5 – highly pass Satisfactory – 3
6 – good
7 – very good Good – 4
PASS
8 – almost excellent
9 – excellent Excellent – 5
10 – perfect
National Research University Higher School of Economics
Syllabus for the course «Machine Learning and Data Mining», Master of Science
Topic 1. Introduction to methods for Machine Learning, IPython notebook, data visualisation
Content: Introduction to methods for Machine Learning. Overview of modern
technologies, problem examples and basic tasks. Introduction to IPython notebook and basic data
visualisation: line and bar plots, histograms, image visualisation, heat maps.
Topic 2. Numpy and scipy basics: common linear algebra and statistical routines, numerical
optimization.
Content: Introduction to numpy library. Matrices and linear algebra routines: basic matrix
operations, decompositions, algorithms, their computational complexity and implementations.
Introduction to scipy library. Statistical routines: basic statistics, sampling, maximal likelihood
fitting, hypothesis testing. Numerical optimization: scalar optimization, local optimization, global
optimization. Classwork and homework: classification of handwritten digits by fitting custom
models and hypothesis tests.
Topic 6. Symbolic computations for Deep Learning and stochastic optimisation. GPU
computing.
Content: Introduction to theanets, keras, lasagne, downhill. Convolution and recurrent
Neural Networks. Introduction to stochastic optimization: Stochastic Gradient Descent, Nesterov's
momentum, AdaDelta, ADAM. Classwork and homework: hand written digets recognition using
Convolution Neural Networks, comparison of optimisation algorithms for Deep Neural Networks.
1 Required Reading
1. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive
datasets. Cambridge University Press, 2014.
2. Peter Flach Machine Learning: The Art and Science of Algorithms that Make Sense
of Data, Cambridge University Press, 2012
2 Recommended Reading
4 Course telemaintenance
All material of the discipline are posted in informational educational site at NRU HSE portal
www.cs.hse.ru/ai. Students are provided with links to research papers, electronic books, data and
software.
16. Equipment
The course requires a laptop, projector, and acoustic systems.
It also requires opportunity to install programming software, such as:
Jupyter notebook server and data analysis libraries
Apache Spark cluster.
Lecture materials, course structure and syllabus are prepared by Maxim Borisyak.