Talk Overview
Talk Overview
Talk Overview
Informatik/HIS – WS 2021/22
Learning From Data – Talk Subjects
Professor Dr. Jörg Schäfer
19.10.2021
Contents
1 Machine Learning Subjects 2
1.1 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The Talks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1 Talk 1: The Learning Problem – Math Foundation . . . . . . . . . . . . . . 2
2 Talk 2: Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Talk 3: SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Talk 4: RVMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5 Talk 5: Decision Trees and Random Forests . . . . . . . . . . . . . . . . . . 3
6 Talk 6: Ensemble Theory – Boosting / Bagging . . . . . . . . . . . . . . . . 3
7 Talk 7: Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . 3
8 Talk 8: Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9 Talk 9: Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 4
10 Talk 10: Deep Convolutional Nets . . . . . . . . . . . . . . . . . . . . . . . 4
11 Talk 11: Linearizing the Non-Linear Transformations . . . . . . . . . . . . . 4
12 Talk 12: Deep Learning and Geometry . . . . . . . . . . . . . . . . . . . . . 4
13 Talk 13: Adversarial Examples for Neural Networks . . . . . . . . . . . . . 4
14 Talk 14: Visualizing and Understanding Convolutional Networks . . . . . . 4
15 Talk 15: Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
16 Talk 16: Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 5
17 Talk 17: Applications of ML in Image Analysis or Recognition . . . . . . . 5
18 Talk 18: Applications of ML in Facial Recognition . . . . . . . . . . . . . . 5
19 Talk 19: Applications of ML in Text Analysis . . . . . . . . . . . . . . . . . 5
20 Talk 20: Applications of Neural Networks with Long Short Term Memory
(LSTM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
21 Talk 21: Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 6
22 Talk 22: Activity, Motion and Gesture Recognition with Channel State
Information (CSI) using ML . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
23 Talk 23: Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
24 Talk 24: Transformer Applications in NLP . . . . . . . . . . . . . . . . . . 6
1
1 Machine Learning Subjects
1.1 Literature
Most of the literature cited below is available in our library (use IEEE and ACM publications!),
but some books have to be borrowed from me personally, so please ask! You can (and often should)
select and read additional papers or books or online references. If you find something useful, please
quote and cite properly. As I am always interested in good articles, books, or tutorials please
highlight accordingly.
3 Talk 3: SVM
Support Vector Machines (SVM) are powerful tools for classification (and regression) for supervised
learning. The talk has to cover the dual formulation in detail (including proofs). The talk can be
based on the book by Cristianini and Shawe-Taylor[CST00] or Vapnik [Vap95]. A good tutorial
including many examples and code is found in https://www.svm-tutorial.com/.
4 Talk 4: RVMs
A Relevance Vector Machine (RVM) is a machine learning technique based on Bayesian inference.
It is used for regression and probabilistic classification. The RVM is similar to the support vector
machine, but provides probabilistic interpretations. The talk should be based on the original
paper by Tipping, see [Tip01]. References on Tippings’s web site (http://www.miketipping.com/
sparsebayes.htm) are helpful, too.
2
5 Talk 5: Decision Trees and Random Forests
Decision tree learning uses a decision tree to predict the target function for classification (and regres-
sion). The exposition should follow selected papers from http://washstat.org/presentations/
20150604/loh_slides.pdf as well as the exposition in [HTF08]. Random forests are a version of en-
semble learning pooling the results of many decision trees to reduce variance. The exposition should
follow selected papers from http://washstat.org/presentations/20150604/loh_slides.pdf
as well as the exposition in [HTF08]. Talk has to present at least one working example in R or
Python or Matlab.
8 Talk 8: Optimization
Optimization techniques play a major role in all areas of machine learning. The talk should cover
• Gradient Descent,
• Stochastic Gradient Descent,
• Convex Optimization, and
• Genetic Algorithm(s).
Must include proofs at least for Gradient Descent and Convex Optimizations.
3
9 Talk 9: Boltzmann Machines
A Boltzmann machine is a type of stochastic recurrent neural network. It can be used for
optimization or for the training process directly. Variants include restricted and deep Boltzmann
machines. Talk should follow [AHS85] and references therein.
4
in this area has been published. The talk shall cover the definition and selected application from
these references (or others). If it could be illustrated by a working example, this would be a plus.
1. Theory
2. Implementation
3. Challenges
At least one working example in R, Python or Matlab is preferred1 . The literature and example is
chosen by the presenter. As a starting point, please consult [AMMIL12], [HTF08] and [TBF05]
1. Theory
2. Implementation
3. Challenges
At least one working example in R, Python or Matlab is preferred2 . The literature and example is
chosen by the presenter. As a starting point, please consult [AMMIL12], [HTF08] and [TBF05]
1. Theory
2. Implementation
3. Challenges
The talk shall cover in detail either of these applications of ML in text analysis
1 Again, the example can be taken from a tutorial and/or existing ML package.
2 Again, the example can be taken from a tutorial and/or existing ML package.
5
1. Language Translation: Translation of a sentence from one language to another.
2. Sentiment Analysis: To determine, from a text corpus, whether the sentiment towards any
topic or product etc. is positive, negative, or neutral.
3. Spam Filtering: Detect unsolicited and unwanted email/messages.
At least one working example in R, Python or Matlab is preferred3 . The literature and example is
chosen by the presenter.
20 Talk 20: Applications of Neural Networks with Long Short Term Memory (LSTM)
The talk should introduce one or several practical examples illustrating the application of machine
learning models in Applications of ML in Neural Networks with Long Short Term Memory (LSTM )
with Applications in any domain. The talk must demonstrate at least one example with a working
example in R, python or Matlab (which can be taken from a tutorial and/or existing ML package).
The talk has to cover
1. Theory
2. Implementation
3. Challenges
At least one working example in R, Python or Matlab is preferred4 . The domain, literature and
example is chosen by the presenter – a good domain is Applications in Finance., but you can
suggest others.
22 Talk 22: Activity, Motion and Gesture Recognition with Channel State Informa-
tion (CSI) using ML
Activity, Motion and Gesture Recognition of a person can be achieved by interpreting the Channel
State Information (CSI) obtained from a WiFi device. This can be used to monitor disabled or
elderly people in the context of ambient assisted living. The talk will cover the topic following
the paper [YND+ 17] and a master thesis of a HIS student [Nee18] as well as the corresponding
conference paper [Nee19] and a preprint [DHKS19]. If desired, the data can be further analyzed
using LSTM networks.
6
References
[AHS85] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for Boltzmann
machines,” Cognitive Science, vol. 9, pp. 147–169, 1985.
[AMMIL12] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning From Data. AML-
Book, 2012.
[CST00] N. Cristianini and J. Shawe-Taylor, An Introduction to Sup-
port Vector Machines and Other Kernel-based Learning Methods,
1st ed. Cambridge University Press, 2000. [Online]. Available:
http://www.amazon.com/Introduction-Support-Machines-Kernel-based-Learning/
dp/0521780195/ref=sr_1_1?ie=UTF8&s=books&qid=1280243230&sr=8-1
[DHKS19] N. Damodaran, E. Haruni, M. Kokhkharova, and J. Schäfer, “Device Free Human
Activity and Fall Recognition using WiFi Channel State Information (CSI),” Frankfurt
University of Applied Sciences, Tech. Rep., October 2019.
[EKSX96] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for
discovering clusters a density-based algorithm for discovering clusters in large spatial
databases with noise,” in Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining, ser. KDD’96. AAAI Press, 1996, pp. 226–231.
[Online]. Available: http://dl.acm.org/citation.cfm?id=3001460.3001507
[FG19] S. Fort and S. Ganguli, “Emergent properties of the local geometry of
neural loss landscapes,” CoRR, vol. abs/1910.05929, 2019. [Online]. Available:
http://arxiv.org/abs/1910.05929
[FH75] K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density
function, with applications in pattern recognition,” Information Theory, IEEE
Transactions on, vol. 21, no. 1, pp. 32–40, Jan. 1975. [Online]. Available:
http://dx.doi.org/10.1109/tit.1975.1055330
[HS06] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with
neural networks,” Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006. [Online].
Available: http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&uid=16873662&
cmd=showdetailview&indexed=google
[HTF08] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical
learning: data mining, inference and prediction, 2nd ed. Springer, 2008.
[Online]. Available: http://scholar.google.com/scholar.bib?q=info:roqIsr0iT4UJ:
scholar.google.com/&output=citation&hl=en&ct=citation&cd=0
[Hus] F. Huszár. Everything that works works because it’s bayesian: Why deep nets
generalize?
[LBD+ 89] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,
and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,”
Neural Comput., vol. 1, no. 4, pp. 541–551, Dec. 1989. [Online]. Available:
http://dx.doi.org/10.1162/neco.1989.1.4.541
[LCLL14] C. Liou, W. Cheng, J. Liou, and D. Liou, “Autoencoder for words,” Neurocomputing,
vol. 139, pp. 84–96, 2014.
[LHY08] C. Liou, J. Huang, and W. Yang, “Modeling word perception using the elman network,”
Neurocomputing, vol. 71, no. 16-18, pp. 3150–3157, 2008.
[LTR17] H. W. Lin, M. Tegmark, and D. Rolnick, “Why Does Deep and Cheap Learning Work
So Well?” Journal of Statistical Physics, vol. 168, no. 6, pp. 1223–1247, Sep 2017.
7
[Mac67] J. MacQueen, “Some methods for classification and analysis of multivariate
observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical
Statistics and Probability, Volume 1: Statistics. Berkeley, Calif.: University of
California Press, 1967, pp. 281–297. [Online]. Available: https://projecteuclid.org/
euclid.bsmsp/1200512992
[Nee18] D. Neena, “Device free indoor human activity recognition,” Master’s thesis, Frankfurt
University of Applied Sciences, 10 2018.
[Nee19] Neena Damodaran and Jörg Schäfer, “Device Free Human Activity Recognition
using WiFi Channel State Information,” in 16th IEEE International Conference on
Ubiquitous Intelligence and Computing (UIC 2019), 5th IEEE Smart World Congress,
Leicester, vol. 16. IEEE, August 2019.
[NYC14] A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled:
High confidence predictions for unrecognizable images,” CoRR, vol. abs/1412.1897,
2014. [Online]. Available: http://arxiv.org/abs/1412.1897
[OM99] D. Opitz and R. Maclin, “Popular ensemble methods: An empirical study,” J.
Artif. Int. Res., vol. 11, no. 1, pp. 169–198, Jul. 1999. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3013545.3013549
[Ray] S. Ray. Beginners guide to learn dimension reduction techniques. [Online]. Available:
https://www.analyticsvidhya.com/blog/2015/07/dimension-reduction-methods/
[RMC15] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with
deep convolutional generative adversarial networks,” CoRR, vol. abs/1511.06434, 2015.
[SVM14] C. O. S. Sorzano, J. Vargas, and A. P. Montano, “A survey of dimensionality reduction
techniques,” ArXiv e-prints, p. arXiv:1403.2877, Mar. 2014.
[SZS+ 13] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and
R. Fergus, “Intriguing properties of neural networks,” 2013.
[TBF05] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, ser. Intelligent
robotics and autonomous agents. MIT Press, 2005. [Online]. Available:
http://books.google.de/books?id=2Zn6AQAAQBAJ
[Tip01] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”
J. Mach. Learn. Res., vol. 1, pp. 211–244, Sep. 2001. [Online]. Available:
https://doi.org/10.1162/15324430152748236
[TSK+ 18] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on
deep transfer learning,” CoRR, vol. abs/1808.01974, 2018. [Online]. Available:
http://arxiv.org/abs/1808.01974
8
[Vap95] V. N. Vapnik, The nature of statistical learning theory. New York, NY, USA:
Springer-Verlag New York, Inc., 1995.
[VSP+ 17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,
and I. Polosukhin, “Attention is all you need,” 2017, cite arxiv:1706.03762Comment:
15 pages, 5 figures. [Online]. Available: http://arxiv.org/abs/1706.03762
[YHZ+ 17] X. Yuan, P. He, Q. Zhu, R. R. Bhat, and X. Li, “Adversarial examples: Attacks and
defenses for deep learning,” CoRR, vol. abs/1712.07107, 2017. [Online]. Available:
http://arxiv.org/abs/1712.07107
[YND+ 17] S. Yousefi, H. Narui, S. Dayal, S. Ermon, and S. Valaee, “A survey of human activity
recognition using wifi CSI,” CoRR, vol. abs/1708.07129, 2017.