Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Sampling Sparse Representations with Randomized Measurement Langevin Dynamics

Published: 10 February 2021 Publication History

Abstract

Stochastic Gradient Langevin Dynamics (SGLD) have been widely used for Bayesian sampling from certain probability distributions, incorporating derivatives of the log-posterior. With the derivative evaluation of the log-posterior distribution, SGLD methods generate samples from the distribution through performing as a thermostats dynamics that traverses over gradient flows of the log-posterior with certainly controllable perturbation. Even when the density is not known, existing solutions still can first learn the kernel density models from the given datasets, then produce new samples using the SGLD over the kernel density derivatives. In this work, instead of exploring new samples from kernel spaces, a novel SGLD sampler, namely, Randomized Measurement Langevin Dynamics (RMLD) is proposed to sample the high-dimensional sparse representations from the spectral domain of a given dataset.
Specifically, given a random measurement matrix for sparse coding, RMLD first derives a novel likelihood evaluator of the probability distribution from the loss function of LASSO, then samples from the high-dimensional distribution using stochastic Langevin dynamics with derivatives of the logarithm likelihood and Metropolis–Hastings sampling. In addition, new samples in low-dimensional measuring spaces can be regenerated using the sampled high-dimensional vectors and the measurement matrix. The algorithm analysis shows that RMLD indeed projects a given dataset into a high-dimensional Gaussian distribution with Laplacian prior, then draw new sparse representation from the dataset through performing SGLD over the distribution. Extensive experiments have been conducted to evaluate the proposed algorithm using real-world datasets. The performance comparisons on three real-world applications demonstrate the superior performance of RMLD beyond baseline methods.

References

[1]
Sungjin Ahn, Anoop Korattikara, Nathan Liu, Suju Rajan, and Max Welling. 2015. Large-scale distributed Bayesian matrix factorization using stochastic gradient MCMC. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 9--18.
[2]
Sungjin Ahn, Anoop Korattikara, and Max Welling. 2012. Bayesian posterior sampling via stochastic gradient Fisher scoring. In Proceedings of the International Conference on Machine Learning (ICML'12). 1591--1598.
[3]
Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I. Jordan. 2003. An introduction to MCMC for machine learning. Machine Learning 50, 1--2 (2003), 5--43.
[4]
Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).
[5]
Mark Bagnoli and Ted Bergstrom. 2005. Log-concave probability and its applications. Economic Theory 26, 2 (2005), 445--469.
[6]
Yoshua Bengio, Olivier Delalleau, and Nicolas L. Roux. 2006. The curse of highly variable functions for local kernel machines. In Proceedings of the Advances in Neural Information Processing Systems. 107--114.
[7]
T. Tony Cai and Lie Wang. 2011. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Transactions on Information theory 57, 7 (2011), 4680--4688.
[8]
Emmanuel J. Candès, Justin Romberg, and Terence Tao. 2006. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information theory 52, 2 (2006), 489--509.
[9]
Bing Cao, Nannan Wang, Jie Li, and Xinbo Gao. 2018. Data augmentation-based joint learning for heterogeneous face recognition. IEEE Transactions on Neural Networks and Learning Systems 30, 6 (2018), 1731--1743.
[10]
Yuan Cao, Haibo He, and Hong Man. 2012. SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE Transactions on Neural Networks and Learning Systems 23, 8 (2012), 1254--1268.
[11]
Tianqi Chen, Emily Fox, and Carlos Guestrin. 2014. Stochastic gradient hamiltonian monte carlo. In Proceedings of the International Conference on Machine Learning. 1683--1691.
[12]
Soumith Chintala, Emily Denton, Martin Arjovsky, and Michael Mathieu. [n.d.]. How to train a GAN? Tips and tricks to make GANs work (2017). Retrieved from https://github.com/soumith/ganhacks.
[13]
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. 2017. EMNIST: An extension of MNIST to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN'17). IEEE, 2921--2926.
[14]
David L. Donoho. 2006. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306.
[15]
David L. Donoho. 2006. For most large underdetermined systems of linear equations the minimal norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59, 6 (2006), 797--829.
[16]
David L. Donoho, Michael Elad, and Vladimir N. Temlyakov. 2006. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory 52, 1 (2006), 6--18.
[17]
Chao Du, Jun Zhu, and Bo Zhang. 2018. Learning deep generative models with doubly stochastic gradient MCMC. IEEE Transactions on Neural Networks and Learning Systems 29, 7 (2018), 3084--3096.
[18]
Raaz Dwivedi, Yuansi Chen, Martin J. Wainwright, and Bin Yu. 2018. Log-concave sampling: Metropolis-Hastings algorithms are fast! In Proceedings of the Conference on Learning Theory. PMLR, 793--797.
[19]
Maurizio Filippone and Raphael Engler. 2015. Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE). In International Conference on Machine Learning. 1015--1024.
[20]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems. 2672--2680.
[21]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of wasserstein gans. In Proceedings of the Advances in Neural Information Processing Systems. 5767--5777.
[22]
David Ha and Douglas Eck. 2018. A neural representation of sketch drawings. In International Conference on Learning Representations. https://openreview.net/forum?id=Hy6GHpkCW.
[23]
W. Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1 (1970), 97--109.
[24]
Shihao Ji, Ya Xue, and L. Carin. 2008. Bayesian compressive sensing. IEEE Transactions on Signal Processing 56, 6 (2008), 2346--2356.
[25]
Xiao-Bo Jin, Xu-Yao Zhang, Kaizhu Huang, and Guang-Gang Geng. 2018. Stochastic conjugate gradient algorithm with variance reduction. IEEE Transactions on Neural Networks and Learning Systems 30, 5 (2018), 1360--1369.
[26]
Yann LeCun, Corinna Cortes, and C. J. Burges. 2010. MNIST handwritten digit database. AT8T Labs. Retrieved from http://yann.lecun.com/exdb/mnist. 2 (2010).
[27]
Chunyuan Li, Changyou Chen, David Carlson, and Lawrence Carin. 2016. Preconditioned stochastic gradient langevin dynamics for deep neural networks. In Proceedings of the 13th AAAI Conference on Artificial Intelligence. AAAI Press, 1788--1794.
[28]
Luoqing Li, Weifu Li, Bin Zou, Yulong Wang, Yuan Yan Tang, and Hua Han. 2018. Learning with coefficient-based regularized regression on Markov resampling. IEEE Transactions on Neural Networks and Learning Systems 29, 9 (2018), 4166--4176.
[29]
Xiaoqiang Lu, Yulong Wang, and Yuan Yuan. 2013. Sparse coding from a Bayesian perspective. IEEE Transactions on Neural Networks and Learning Systems 24, 6 (2013), 929--939.
[30]
Yi-An Ma, Tianqi Chen, and Emily Fox. 2015. A complete recipe for stochastic gradient MCMC. In Proceedings of the Advances in Neural Information Processing Systems. 2917--2925.
[31]
Dougal Maclaurin and Ryan P. Adams. 2014. Firefly Monte Carlo: Exact MCMC with subsets of data. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 543--552.
[32]
Stephan Mandt, Matthew Hoffman, and David Blei. 2016. A variational analysis of stochastic gradient algorithms. In Proceedings of the International Conference on Machine Learning. 354--363.
[33]
Stephan Mandt, Matthew D. Hoffman, and David M. Blei. 2017. Stochastic gradient descent as approximate Bayesian inference. The Journal of Machine Learning Research 18, 1 (2017), 4873--4907.
[34]
Gaétan Marceau-Caron and Yann Ollivier. 2017. Natural Langevin dynamics for neural networks. In Proceedings of the International Conference on Geometric Science of Information. Springer, 451--459.
[35]
Boris T. Polyak and Anatoli B. Juditsky. 1992. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization 30, 4 (1992), 838--855.
[36]
Carl Edward Rasmussen. 2004. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning. Springer, 63--71.
[37]
Vikas Chandrakant Raykar and Ramani Duraiswami. 2006. Fast optimal bandwidth selection for kernel density estimation. In Proceedings of the 2006 SIAM International Conference on Data Mining. SIAM, 524--528.
[38]
Eitan Richardson and Yair Weiss. 2018. On GANs and GMMs. Advances in Neural Information Processing Systems 31 (2018), 5847--5858.
[39]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. 2016. Improved techniques for training GANs. In Advances in Neural Information Processing Systems 29. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 2234--2242. Retrieved from http://papers.nips.cc/paper/6125-improved-techniques-for-training-gans.pdf.
[40]
Hiroaki Sasaki, Yung-Kyun Noh, Gang Niu, and Masashi Sugiyama. 2016. Direct density derivative estimation. Neural Computation 28, 6 (2016), 1101--1140.
[41]
Hiroaki Sasaki, Yung-Kyun Noh, and Masashi Sugiyama. 2015. Direct density-derivative estimation and its application in KL-divergence approximation. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. 809--818.
[42]
Ingmar Schuster, Heiko Strathmann, Brooks Paige, and Dino Sejdinovic. 2017. Kernel sequential Monte Carlo. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 390--409.
[43]
David W. Scott. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley 8 Sons.
[44]
Heiko Strathmann. 2018. Kernel Methods for Monte Carlo. Ph.D. Dissertation. UCL (University College London).
[45]
Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, and Arthur Gretton. 2015. Gradient-free Hamiltonian Monte Carlo with efficient kernel exponential families. In Proceedings of the Advances in Neural Information Processing Systems. 955--963.
[46]
Jayaraman J. Thiagarajan, Karthikeyan Natesan Ramamurthy, and Andreas Spanias. 2015. Learning stable multilevel dictionaries for sparse representations. IEEE Transactions on Neural Networks and Learning Systems 26, 9 (2015), 1913--1926.
[47]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 1 (1996), 267--288.
[48]
Joel A. Tropp and Anna C. Gilbert. 2007. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 53, 12 (2007), 4655--4666.
[49]
Berwin A. Turlach. 1993. Bandwidth selection in kernel density estimation: A review. In CORE and Institut de Statistique. Citeseer.
[50]
Vladimir Naumovich Vapnik. 1999. An overview of statistical learning theory. IEEE Transactions on Neural Networks 10, 5 (1999), 988--999.
[51]
Ruosi Wan, Mingjun Zhong, Haoyi Xiong, and Zhanxing Zhu. 2020. Neural control variates for Monte Carlo variance reduction. In Machine Learning and Knowledge Discovery in Databases, Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, and Céline Robardet (Eds.). Springer International Publishing, Cham, 533--547.
[52]
Max Welling and Yee W. Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 681--688.
[53]
Di Wu and Jinwen Ma. 2018. A two-layer mixture model of Gaussian process functional regressions and its MCMC EM algorithm. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4894--4904.
[54]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[55]
Haoyi Xiong, Wei Cheng, Jiang Bian, Wenqing Hu, and Zhishan Guo. 2017. AWDA: Adaptive wishart discriminant analysis. In Proceedings of the 17th IEEE International Conference on Data Mining (ICDM’17). IEEE.
[56]
Haoyi Xiong, Wei Cheng, Yanjie Fu, Wenqing Hu, Jiang Bian, and Zhishan Guo. 2018. De-biasing covariance-regularized discriminant analysis. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2889--2897.
[57]
Haoyi Xiong, Kafeng Wang, Jiang Bian, Zhanxing Zhu, Cheng-zhong Xu, Zhishan Guo, and Jun Huan. [n.d.]. SpHMC: Spectral Hamiltonian Monte Carlo. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence.
[58]
Haoyi Xiong, Jinghe Zhang, Yu Huang, Kevin Leach, and Laura E. Barnes. 2017. Daehr: A discriminant analysis framework for electronic health record data and an application to early detection of mental health disorders. ACM Transactions on Intelligent Systems and Technology 8, 3 (2017), 47.
[59]
Jie Xu, Yuan Yan Tang, Bin Zou, Zongben Xu, Luoqing Li, and Yang Lu. 2015. The generalization ability of online SVM classification based on Markov sampling. IEEE Transactions on Neural Networks and Learning Systems 26, 3 (2015), 628--639.
[60]
Nanyang Ye and Zhanxing Zhu. 2018. Stochastic fractional Hamiltonian Monte Carlo. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18).
[61]
Bin Zou, Luoqing Li, Zongben Xu, Tao Luo, and Yuan Yan Tang. 2013. Generalization performance of Fisher linear discriminant based on Markov sampling. IEEE Transactions on Neural Networks and Learning Systems 24, 2 (2013), 288--300.

Cited By

View all
  • (2023)An efficient joint framework for interacting knowledge graph and item recommendationKnowledge and Information Systems10.1007/s10115-022-01808-z65:4(1685-1712)Online publication date: 1-Apr-2023
  • (2022)Characterizing and Forecasting Urban Vibrancy Evolution: A Multi-View Graph Mining PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/356868317:5(1-24)Online publication date: 30-Nov-2022
  • (2022)HW-Forest: Deep Forest with Hashing Screening and Window ScreeningACM Transactions on Knowledge Discovery from Data10.1145/3532193Online publication date: 4-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 2
Survey Paper and Regular Papers
April 2021
524 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3446665
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2021
Accepted: 01 September 2020
Received: 01 October 2019
Published in TKDD Volume 15, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hamiltonian Monte Carlo
  2. LASSO
  3. compressive sensing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Science and Technology Development Fund of Macao S.A.R (FDCT)
  • National Natural Science Foundation of China
  • Shenzhen Discipline Construction Project for Urban Computing and Data Intelligence
  • National Key R8D Program of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)122
  • Downloads (Last 6 weeks)28
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An efficient joint framework for interacting knowledge graph and item recommendationKnowledge and Information Systems10.1007/s10115-022-01808-z65:4(1685-1712)Online publication date: 1-Apr-2023
  • (2022)Characterizing and Forecasting Urban Vibrancy Evolution: A Multi-View Graph Mining PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/356868317:5(1-24)Online publication date: 30-Nov-2022
  • (2022)HW-Forest: Deep Forest with Hashing Screening and Window ScreeningACM Transactions on Knowledge Discovery from Data10.1145/3532193Online publication date: 4-May-2022
  • (2021)Context-Aware Semantic Annotation of Mobility RecordsACM Transactions on Knowledge Discovery from Data10.1145/347704816:3(1-20)Online publication date: 22-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media