Talk Overview

M.Sc.
Informatik/HIS – WS 2021/22
Learning From Data – Talk Subjects
Professor Dr. Jörg Schäfer
19.10.2021
Contents
1 Machine Learning Subjects 2
1.1 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The Talks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1 Talk 1: The Learning Problem – Math Foundation . . . . . . . . . . . . . . 2
2 Talk 2: Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Talk 3: SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Talk 4: RVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5 Talk 5: Decision Trees and Random Forests . . . . . . . . . . . . . . . . . . 3
6 Talk 6: Ensemble Theory – Boosting / Bagging . . . . . . . . . . . . . . . . 3
7 Talk 7: Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . 3
8 Talk 8: Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9 Talk 9: Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 4
10 Talk 10: Deep Convolutional Nets . . . . . . . . . . . . . . . . . . . . . . . 4
11 Talk 11: Linearizing the Non-Linear Transformations . . . . . . . . . . . . . 4
12 Talk 12: Deep Learning and Geometry . . . . . . . . . . . . . . . . . . . . . 4
13 Talk 13: Adversarial Examples for Neural Networks . . . . . . . . . . . . . 4
14 Talk 14: Visualizing and Understanding Convolutional Networks . . . . . . 4
15 Talk 15: Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
16 Talk 16: Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 5
17 Talk 17: Applications of ML in Image Analysis or Recognition . . . . . . . 5
18 Talk 18: Applications of ML in Facial Recognition . . . . . . . . . . . . . . 5
19 Talk 19: Applications of ML in Text Analysis . . . . . . . . . . . . . . . . . 5
20 Talk 20: Applications of Neural Networks with Long Short Term Memory
(LSTM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
21 Talk 21: Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 6
22 Talk 22: Activity, Motion and Gesture Recognition with Channel State
Information (CSI) using ML . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
23 Talk 23: Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
24 Talk 24: Transformer Applications in NLP . . . . . . . . . . . . . . . . . . 6
1
1 Machine Learning Subjects
1.1 Literature
Most of the literature cited below is available in our library (use IEEE and ACM publications!),
but some books have to be borrowed from me personally, so please ask! You can (and often should)
select and read additional papers or books or online references. If you find something useful, please
quote and cite properly. As I am always interested in good articles, books, or tutorials please
highlight accordingly.
1.2 The Talks

In general, many talks can use input from [AMMIL12] and [HTF08]. For mathematical background
and further ideas, please consider http://nowak.ece.wisc.edu/SLT07.html as well.
1 Talk 1: The Learning Problem – Math Foundation

The talk covers the following subjects:
1. Generalization Bound
(a) Introduces, explains and interprets theorems 2.4 and 2.5 (VC bound) in [AMMIL12]
(b) Proof of the VC bound as e.g. in the appendix of [AMMIL12].
You find a very useful explanation here: https://mostafa-samir.github.io/ml-theory-pt2/
2 Talk 2: Bayesian Methods

The Bayes formula is at the heart of a probabilistic approach to many ML-algorithms. From the
Bayes formula a whole framework of Baysian methods can be deduced. The talk has to cover
1. Bayes formula
2. The prior and the posterior
3. Naive Bayes
4. Recursive equations and Bayes Filters, e.g. 2.4.3 from [TBF05]
5. The Kalman filter (no proof) as described in [Mur12] or [TBF05]
6. Selected examples of bayesian approaches from [TBF05].
For the basic formulas, chapter 5 of [Mur12] is useful. A connection of Bayesian methods to neural
networks is found in [Hus], [Mac92] and [MHB17].
3 Talk 3: SVM
Support Vector Machines (SVM) are powerful tools for classification (and regression) for supervised
learning. The talk has to cover the dual formulation in detail (including proofs). The talk can be
based on the book by Cristianini and Shawe-Taylor[CST00] or Vapnik [Vap95]. A good tutorial
including many examples and code is found in https://www.svm-tutorial.com/.
4 Talk 4: RVMs
A Relevance Vector Machine (RVM) is a machine learning technique based on Bayesian inference.
It is used for regression and probabilistic classification. The RVM is similar to the support vector
machine, but provides probabilistic interpretations. The talk should be based on the original
paper by Tipping, see [Tip01]. References on Tippings’s web site (http://www.miketipping.com/
sparsebayes.htm) are helpful, too.
2
5 Talk 5: Decision Trees and Random Forests
Decision tree learning uses a decision tree to predict the target function for classification (and regres-
sion). The exposition should follow selected papers from http://washstat.org/presentations/
20150604/loh_slides.pdf as well as the exposition in [HTF08]. Random forests are a version of en-
semble learning pooling the results of many decision trees to reduce variance. The exposition should
follow selected papers from http://washstat.org/presentations/20150604/loh_slides.pdf
as well as the exposition in [HTF08]. Talk has to present at least one working example in R or
Python or Matlab.
6 Talk 6: Ensemble Theory – Boosting / Bagging

In machine learning, ensemble methods use multiple learning algorithms to obtain better predictive
performance than could be obtained from any of the constituent learning algorithms alone. The
talk has to cover
• Ensemble Theory
• Bayes optimal classifier
• Bootstrap
• Bagging
• Boosting
• Stacking
It can follow the papers outlined in [OM99].
7 Talk 7: Dimensionality Reduction Techniques

The curse of dimensionality is a well-known obstacle for any learning theory obtaining information
from high-dimensional data. Thus, dimensional reduction techniques are important. The talk
should cover
• PCA
• Kernel PCA
• Linear discriminant analysis (LDA) – a generalization of Fisher’s linear discriminant
• Autoencoder
• Manifolds
The talk should cover the above subjects broadly and dive deeper into at least one subject. For
example, auto-encoding is a hot subjects and we refer to [LHY08] and [LCLL14] as a suggestion
for the talk. A useful introduction one may find in [Ray], whereas a scientific survey to be used as
a starting point is [SVM14].
8 Talk 8: Optimization
Optimization techniques play a major role in all areas of machine learning. The talk should cover
• Gradient Descent,
• Stochastic Gradient Descent,
• Convex Optimization, and
• Genetic Algorithm(s).
Must include proofs at least for Gradient Descent and Convex Optimizations.
3
9 Talk 9: Boltzmann Machines
A Boltzmann machine is a type of stochastic recurrent neural network. It can be used for
optimization or for the training process directly. Variants include restricted and deep Boltzmann
machines. Talk should follow [AHS85] and references therein.
10 Talk 10: Deep Convolutional Nets

Deep Convolutional Nets have become the super stars in machine learning recently. The talk should
cover the definition of Deep Convolutional Nets following the seminal paper of LeCun ([LBD+ 89]).
Afterwards, the ideas expressed in [HS06] should be explained, too. The latter (incl. autoencoding)
is linked to the (Restricted) Boltzmann machine ideas.
11 Talk 11: Linearizing the Non-Linear Transformations

In [RMC15] it has been shown that deep convolution networks can linearise highly non-linear
operations such as subtracting a picture of a woman without sunglasses from a picture of a woman
with sunglasses and adding a picture of a man without sunglasses to yield the man with sunglasses.
This has also been mentioned in the work of Stephane Mallat, see [Mal16] (Section 4. “Contractions
and scale separation with wavelets”). The talk should explain the paper [RMC15] and ideally
provide simulation examples.
12 Talk 12: Deep Learning and Geometry

Why does deep learning work so well? This is mathematically not understood at all, however, re-
cently a few papers were published that are beginning to shed a light onto this question, e.g. [Mal16],
[LTR17] and [FG19] and a discussion at https://www.quora.com/What-is-the-significance-
of-Lin-and-TegmarkâĂŹs-paper-Why-does-deep-and-cheap-learning-work-so-well. The
talk should cover these topics (and papers), ideally with some examples.
13 Talk 13: Adversarial Examples for Neural Networks

In a seminal paper Szegedy et al [SZS+ 13] presented examples for how easily trained neural net-
works can be fooled by minor changes to the input data, see https://www.kdnuggets.com/2015/
01/deep-learning-flaws-universal-machine-learning.html. This has great theoretical im-
portance and huge impact on practical solutions. It has triggered a new sub field of Adversarial
attacks against neural networks and how to protect against it, see [YHZ+ 17] or [NYC14]. The talk
will present and explain the paper [SZS+ 13] and selected topics from [YHZ+ 17] and [NYC14]. If
possible, a computational (live) example of adverbial examples should be presented.
14 Talk 14: Visualizing and Understanding Convolutional Networks

Visualizing and Understanding Convolutional Networks is challenging a neural networks in general
and (deep) convolutional networks in particular are black boxes. For understanding the domain
and accuracy of results, as well as for training the networks appropriately, however it is paramount
to understand what is going on “under the hood”. The talk should cover these topics following the
paper by Zeiler ([ZF13]).
15 Talk 15: Transfer Learning

If we have trained our networks and encounter situations with data different from the training
data, can we still make use of the trained models and transfer our findings into a different context
somehow? This is indeed sometimes possible, saves retrying (or at least quite a bit retraining) and is
called transfer learning. For a good online overview, see http://ruder.io/transfer-learning/
index.html#adefinitionoftransferlearning. A paper [TSK+ 18] presenting a survey of work
4
in this area has been published. The talk shall cover the definition and selected application from
these references (or others). If it could be illustrated by a working example, this would be a plus.
16 Talk 16: Unsupervised Learning

Unsupervised Learning uses clustering algorithms. The talk must cover the popular k-means [Mac67]
(including mathematical definition and proofs), mean-shift clustering ([FH75]) and Density-Based
Spatial Clustering of Applications with Noise (DBSCAN) [EKSX96].
17 Talk 17: Applications of ML in Image Analysis or Recognition

The talk should introduce one or several practical examples illustrating the application of machine
learning models in Applications of ML in Image Analysis or Recognition. The talk must demonstrate
at least one example with a working example in R, Python or Matlab (which can be taken from a
tutorial and/or existing ML package). The talk has to cover
1. Theory
2. Implementation
3. Challenges
At least one working example in R, Python or Matlab is preferred1 . The literature and example is
chosen by the presenter. As a starting point, please consult [AMMIL12], [HTF08] and [TBF05]
18 Talk 18: Applications of ML in Facial Recognition

learning models in Applications of ML in Facial Recognition. It can use the talk “Applications of
ML in Image Analysis or Recognitio” as a basis. The talk must demonstrate at least one example
with a working example in R, python or Matlab (which can be taken from a tutorial and/or existing
ML package). The talk has to cover
1. Theory
2. Implementation
3. Challenges
chosen by the presenter. As a starting point, please consult [AMMIL12], [HTF08] and [TBF05]
19 Talk 19: Applications of ML in Text Analysis

learning models in Applications of ML in Text Analysis (e.g. Sentiment Analysis in Social Networks).
The talk must demonstrate at least one example with a working example in R, python or Matlab
(which can be taken from a tutorial and/or existing ML package). The talk has to cover
1. Theory
2. Implementation
3. Challenges
The talk shall cover in detail either of these applications of ML in text analysis
1 Again, the example can be taken from a tutorial and/or existing ML package.
5
1. Language Translation: Translation of a sentence from one language to another.
2. Sentiment Analysis: To determine, from a text corpus, whether the sentiment towards any
topic or product etc. is positive, negative, or neutral.
3. Spam Filtering: Detect unsolicited and unwanted email/messages.
chosen by the presenter.
20 Talk 20: Applications of Neural Networks with Long Short Term Memory (LSTM)
learning models in Applications of ML in Neural Networks with Long Short Term Memory (LSTM )
with Applications in any domain. The talk must demonstrate at least one example with a working
example in R, python or Matlab (which can be taken from a tutorial and/or existing ML package).
The talk has to cover
1. Theory
2. Implementation
3. Challenges
At least one working example in R, Python or Matlab is preferred4 . The domain, literature and
example is chosen by the presenter – a good domain is Applications in Finance., but you can
suggest others.
21 Talk 21: Deep Reinforcement Learning

Deep Reinforcement Learning combines Reinforcement Learning with deep convolutional networks.
The talk shall cover the theory, the topology of the nets and at least one example. Literature to be
discussed.
22 Talk 22: Activity, Motion and Gesture Recognition with Channel State Informa-
tion (CSI) using ML
Activity, Motion and Gesture Recognition of a person can be achieved by interpreting the Channel
State Information (CSI) obtained from a WiFi device. This can be used to monitor disabled or
elderly people in the context of ambient assisted living. The talk will cover the topic following
the paper [YND+ 17] and a master thesis of a HIS student [Nee18] as well as the corresponding
conference paper [Nee19] and a preprint [DHKS19]. If desired, the data can be further analyzed
using LSTM networks.
23 Talk 23: Transformers

Recently, Transformer architectures have been proposed with great success for processing data in
the area of natural language processing. The talk should cover the seminal paper [VSP+ 17] and
related papers.
24 Talk 24: Transformer Applications in NLP

Transformers have become a de-facto gold standard for many application areas , in particular for
Natural Language Processing (NLP). Talk should cover a full working example including code
examples using pyTorch or tensor flow. Examples can be found in http://peterbloem.nl/blog/
transformers and https://blog.floydhub.com/the-transformer-in-pytorch/ and https://
towardsdatascience.com/how-to-code-the-transformer-in-pytorch-24db27c8f9ec.
6
References
[AHS85] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for Boltzmann
machines,” Cognitive Science, vol. 9, pp. 147–169, 1985.
[AMMIL12] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning From Data. AML-
Book, 2012.
[CST00] N. Cristianini and J. Shawe-Taylor, An Introduction to Sup-
port Vector Machines and Other Kernel-based Learning Methods,
1st ed. Cambridge University Press, 2000. [Online]. Available:
http://www.amazon.com/Introduction-Support-Machines-Kernel-based-Learning/
dp/0521780195/ref=sr_1_1?ie=UTF8&s=books&qid=1280243230&sr=8-1
[DHKS19] N. Damodaran, E. Haruni, M. Kokhkharova, and J. Schäfer, “Device Free Human
Activity and Fall Recognition using WiFi Channel State Information (CSI),” Frankfurt
University of Applied Sciences, Tech. Rep., October 2019.
[EKSX96] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for
discovering clusters a density-based algorithm for discovering clusters in large spatial
databases with noise,” in Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining, ser. KDD’96. AAAI Press, 1996, pp. 226–231.
[Online]. Available: http://dl.acm.org/citation.cfm?id=3001460.3001507
[FG19] S. Fort and S. Ganguli, “Emergent properties of the local geometry of
neural loss landscapes,” CoRR, vol. abs/1910.05929, 2019. [Online]. Available:
http://arxiv.org/abs/1910.05929
[FH75] K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density
function, with applications in pattern recognition,” Information Theory, IEEE
Transactions on, vol. 21, no. 1, pp. 32–40, Jan. 1975. [Online]. Available:
http://dx.doi.org/10.1109/tit.1975.1055330
[HS06] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with
neural networks,” Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006. [Online].
Available: http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&uid=16873662&
cmd=showdetailview&indexed=google
[HTF08] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical
learning: data mining, inference and prediction, 2nd ed. Springer, 2008.
[Online]. Available: http://scholar.google.com/scholar.bib?q=info:roqIsr0iT4UJ:
scholar.google.com/&output=citation&hl=en&ct=citation&cd=0
[Hus] F. Huszár. Everything that works works because it’s bayesian: Why deep nets
generalize?
[LBD+ 89] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,
and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,”
Neural Comput., vol. 1, no. 4, pp. 541–551, Dec. 1989. [Online]. Available:
http://dx.doi.org/10.1162/neco.1989.1.4.541
[LCLL14] C. Liou, W. Cheng, J. Liou, and D. Liou, “Autoencoder for words,” Neurocomputing,
vol. 139, pp. 84–96, 2014.
[LHY08] C. Liou, J. Huang, and W. Yang, “Modeling word perception using the elman network,”
Neurocomputing, vol. 71, no. 16-18, pp. 3150–3157, 2008.
[LTR17] H. W. Lin, M. Tegmark, and D. Rolnick, “Why Does Deep and Cheap Learning Work
So Well?” Journal of Statistical Physics, vol. 168, no. 6, pp. 1223–1247, Sep 2017.
7
[Mac67] J. MacQueen, “Some methods for classification and analysis of multivariate
observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical
Statistics and Probability, Volume 1: Statistics. Berkeley, Calif.: University of
California Press, 1967, pp. 281–297. [Online]. Available: https://projecteuclid.org/
euclid.bsmsp/1200512992
[Mac92] D. J. C. MacKay, “A practical bayesian framework for backpropagation networks,”

Neural Comput., vol. 4, no. 3, pp. 448–472, May 1992. [Online]. Available:
http://dx.doi.org/10.1162/neco.1992.4.3.448
[Mal16] S. Mallat, “Understanding deep convolutional networks,” Philosophical Transactions
of the Royal Society of London Series A, vol. 374, no. 2065, p. 20150203, Apr 2016.
[MHB17] S. Mandt, M. D. Hoffman, and D. M. Blei, “Stochastic gradient descent as

approximate bayesian inference,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 4873–4907,
Jan. 2017. [Online]. Available: http://dl.acm.org/citation.cfm?id=3122009.3208015
[Mur12] K. P. Murphy, Machine learning: a probabilistic perspective, Cambridge, MA, 2012.
[Nee18] D. Neena, “Device free indoor human activity recognition,” Master’s thesis, Frankfurt
University of Applied Sciences, 10 2018.
[Nee19] Neena Damodaran and Jörg Schäfer, “Device Free Human Activity Recognition
using WiFi Channel State Information,” in 16th IEEE International Conference on
Ubiquitous Intelligence and Computing (UIC 2019), 5th IEEE Smart World Congress,
Leicester, vol. 16. IEEE, August 2019.
[NYC14] A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled:
High confidence predictions for unrecognizable images,” CoRR, vol. abs/1412.1897,
2014. [Online]. Available: http://arxiv.org/abs/1412.1897
[OM99] D. Opitz and R. Maclin, “Popular ensemble methods: An empirical study,” J.
Artif. Int. Res., vol. 11, no. 1, pp. 169–198, Jul. 1999. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3013545.3013549
[Ray] S. Ray. Beginners guide to learn dimension reduction techniques. [Online]. Available:
https://www.analyticsvidhya.com/blog/2015/07/dimension-reduction-methods/
[RMC15] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with
deep convolutional generative adversarial networks,” CoRR, vol. abs/1511.06434, 2015.
[SVM14] C. O. S. Sorzano, J. Vargas, and A. P. Montano, “A survey of dimensionality reduction
techniques,” ArXiv e-prints, p. arXiv:1403.2877, Mar. 2014.
[SZS+ 13] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and
R. Fergus, “Intriguing properties of neural networks,” 2013.
[TBF05] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, ser. Intelligent
robotics and autonomous agents. MIT Press, 2005. [Online]. Available:
http://books.google.de/books?id=2Zn6AQAAQBAJ
[Tip01] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”
J. Mach. Learn. Res., vol. 1, pp. 211–244, Sep. 2001. [Online]. Available:
https://doi.org/10.1162/15324430152748236
[TSK+ 18] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on
deep transfer learning,” CoRR, vol. abs/1808.01974, 2018. [Online]. Available:
8
[Vap95] V. N. Vapnik, The nature of statistical learning theory. New York, NY, USA:
Springer-Verlag New York, Inc., 1995.
[VSP+ 17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,
and I. Polosukhin, “Attention is all you need,” 2017, cite arxiv:1706.03762Comment:
15 pages, 5 figures. [Online]. Available: http://arxiv.org/abs/1706.03762
[YHZ+ 17] X. Yuan, P. He, Q. Zhu, R. R. Bhat, and X. Li, “Adversarial examples: Attacks and
defenses for deep learning,” CoRR, vol. abs/1712.07107, 2017. [Online]. Available:
[YND+ 17] S. Yousefi, H. Narui, S. Dayal, S. Ermon, and S. Valaee, “A survey of human activity
recognition using wifi CSI,” CoRR, vol. abs/1708.07129, 2017.
[ZF13] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional

networks.” CoRR, vol. abs/1311.2901, 2013. [Online]. Available: http://dblp.uni-
trier.de/db/journals/corr/corr1311.html#ZeilerF13

Talk Overview

Uploaded by

Copyright:

Available Formats

Talk Overview

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Talk Overview

Uploaded by

Copyright:

Available Formats

M.Sc.

1.2 The Talks

1 Talk 1: The Learning Problem – Math Foundation

2 Talk 2: Bayesian Methods

6 Talk 6: Ensemble Theory – Boosting / Bagging

7 Talk 7: Dimensionality Reduction Techniques

10 Talk 10: Deep Convolutional Nets

11 Talk 11: Linearizing the Non-Linear Transformations

12 Talk 12: Deep Learning and Geometry

13 Talk 13: Adversarial Examples for Neural Networks

14 Talk 14: Visualizing and Understanding Convolutional Networks

15 Talk 15: Transfer Learning

16 Talk 16: Unsupervised Learning

17 Talk 17: Applications of ML in Image Analysis or Recognition

18 Talk 18: Applications of ML in Facial Recognition

19 Talk 19: Applications of ML in Text Analysis

21 Talk 21: Deep Reinforcement Learning

23 Talk 23: Transformers

24 Talk 24: Transformer Applications in NLP

[Mac92] D. J. C. MacKay, “A practical bayesian framework for backpropagation networks,”

[MHB17] S. Mandt, M. D. Hoffman, and D. M. Blei, “Stochastic gradient descent as

[ZF13] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional

You might also like