article

Free access

Accelerating t-SNE using tree-based algorithms

Editors: Kevin Murphy, Bernhard Schölkopf Author:

Laurens Van Der MaatenAuthors Info & Claims

The Journal of Machine Learning Research, Volume 15, Issue 1

Pages 3221 - 3245

Published: 01 January 2014 Publication History

Abstract

The paper investigates the acceleration of t-SNE--an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots--using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.

References

[1]

J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324 (4):446-449, 1986.

[2]

B.J.C. Baxter and G. Roussos. A new error estimate of the fast Gauss transform. SIAM Journal on Scientific Computation, 24(1):257-259, 2002.

[3]

R. Bayer and E. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173-189, 1972.

[4]

A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In Proceedings of the International Conference on Machine Learning, pages 97-104, 2006.

[5]

S. Brin. Near neighbor search in large metric spaces. In Proceedings of the International Conference on Very Large Data Bases, pages 574-584, 1995.

[6]

C.J.C. Burges. Dimension reduction: A guided tour. Foundations and Trends in Machine Learning, 2(4):1-95, 2010.

[7]

M.Á. Carreira-Perpiñán. The elastic embedding algorithm for dimensionality reduction. In Proceedings of the International Conference on Machine Learning, pages 167-174, 2010.

[8]

M. Chalmers. A linear iteration time layout algorithm for visualising high-dimensional data. In Proceedings of IEEE Visualization, pages 127-132, 1996.

[9]

K. Cho, B. van Merriënboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In arXiv 1406.1078, 2014.

[10]

D.J. Croton, V. Springel, S.D.M. White, G. De Lucia, C.S. Frenk, L. Gao, A. Jenkins, G. Kauffmann, J.F. Navarro, and N. Yoshida. The many lives of active galactic nuclei: cooling flows, black holes and the luminosities and colours of galaxies. Monthly Notices of the Royal Astronomical Society, 365(1):11-28, 2006.

[11]

N. de Freitas, Y. Wang, M. Mahdaviani, and D. Lang. Fast Krylov methods for N-body learning. In Advances in Neural Information Processing Systems, volume 18, pages 251-258, 2006.

[12]

J.H. Freidman, J.L. Bentley, and R.A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3:209-226, 1977.

[13]

T.M.J. Fruchterman and E.M. Reingold. Graph drawing by force-directed placement. Software: Practice and Experience, 21(11):1129-1164, 1991.

[14]

K. Fukunaga and P.M. Narendra. A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers, 24:750-753, 1975.

[15]

A.G. Gray. Fast kernel matrix-vector multiplication with application to gaussian process learning. Technical Report CMU-CS-04-110, Carnegie Mellon University, 2004.

[16]

A.G. Gray and A.W. Moore. N-body problems in statistical learning. In Advances in Neural Information Processing Systems, pages 521-527, 2001.

[17]

A.G. Gray and A.W. Moore. Rapid evaluation of multiple density models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2003.

[18]

L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. Journal of Computational Physics, 73:325-348, 1987.

[19]

J. Heer, M. Bostock, and V. Ogievetsky. A tour through the visualization zoo. Communications of the ACM, 53:59-67, 2010.

[20]

G.E. Hinton and S.T. Roweis. Stochastic Neighbor Embedding. In Advances in Neural Information Processing Systems, volume 15, pages 833-840, 2003.

[21]

G.E Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. In arXiv 1207.0580, 2012.

[22]

Y. Hu. Efficient and high-quality force-directed graph drawing. The Mathematica Journal, 10(1):37-71, 2005.

[23]

P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of 30th Symposium on Theory of Computing, 1998.

[24]

R.A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1:295-307, 1988.

[25]

S. Ji. Computational genetic neuroanatomy of the developing mouse brain: dimensionality reduction, visualization, and clustering. BMC Bioinformatics, 14(222):1-14, 2013.

[26]

Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.

[27]

D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann. Mastering the Information Age: Solving Problems with Visual Analytics. Eurographics Association, Germany, 2010.

[28]

A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[29]

C.C. Laczny, N. Pinel, N. Vlassis, and P. Wilmes. Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Scientific Reports, 4:1-12, 2014.

[30]

D. Lang, M. Klaas, and N. de Freitas. Empirical testing of fast kernel density estimation algorithms. Technical Report TR-2005-03, University of British Columbia, 2005.

[31]

N.D. Lawrence. Spectral dimensionality reduction via maximum entropy. Proceedings of the International Conference on Artificial Intelligence and Statistics, JMLR W&CP, 15: 51-59, 2011.

[32]

Y. LeCun, F.J. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 97-104, 2004.

[33]

T. Liu, A.W. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In Advances in Neural Information Processing Systems, volume 17, pages 825-832, 2004.

[34]

M. Mahdaviani, N. de Freitas, B. Fraser, and F. Hamze. Fast computational methods for visually guided robots. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 138-143, 2005.

[35]

M. Muja and D.G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications, 2009.

[36]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A.Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

[37]

D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2161-2168, 2006.

[38]

A. Quigley and P. Eades. FADE: Graph drawing, clustering, and visual abstraction. In Proceedings of the International Symposium on Graph Drawing, pages 197-210, 2000.

[39]

V.C. Raykar and R. Duraiswami. Fast optimal bandwidth selection for kernel density estimation. In Proceedings of the 2006 SIAM International Conference on Data Mining, pages 524-528, 2006.

[40]

V. Rokhlin. Rapid solution of integral equations of classic potential theory. Journal of Computational Physics, 60:187-207, 1985.

[41]

S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500):2323-2326, 2000.

[42]

R.R. Salakhutdinov and G.E. Hinton. Semantic hashing. In Proceedings of the SIGIR Workshop on Information Retrieval and Applications of Graphical Models, pages 52-63, 2007.

[43]

J.K. Salmon and M.S. Warren. Skeletons from the treecode closet. Journal of Computational Physics, 111(1):136-155, 1994.

[44]

L.K. Saul, K.Q. Weinberger, J.H. Ham, F. Sha, and D.D. Lee. Spectral methods for dimensionality reduction. In Semisupervised Learning. The MIT Press, 2006.

[45]

P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks applied to house numbers digit classification. In Proceedings of the International Conference on Pattern Recognition, pages 3288-3291, 2012.

[46]

F. Sha and L.K. Saul. Large margin Gaussian mixture modeling for phonetic classification and recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages 265-268, 2006.

[47]

C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[48]

V. Springel, N. Yoshidaa, and S.D.M. White. GADGET: A code for collisionless and gasdynamical cosmological simulations. New Astronomy, 6(2):79-117, 2001.

[49]

J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.

[50]

P. Tiño and I.T. Nabney. Hierarchical GTM: Constructing localized nonlinear projection manifolds in a principled way. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):639-656, 2002.

[51]

A. Torralba, R. Fergus, and W.T. Freeman. 80 million tiny images: A large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958-1970, 2008.

[52]

L.J.P. van der Maaten. Learning a parametric embedding by preserving local structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics, JMLR W&CP, volume 5, pages 384-391, 2009.

[53]

L.J.P. van der Maaten. Barnes-Hut-SNE. In Proceedings of the International Conference on Learning Representations, 2013.

[54]

L.J.P. van der Maaten and G.E. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2431-2456, 2008.

[55]

L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. Dimensionality reduction: A comparative review. Technical Report TiCC-TR 2009-005, Tilburg University, 2009.

[56]

N.J. van Eck and L. Waltman. Software survey: Vosviewer, a computer program for bibliometric mapping. Scientometrics, 84:523-538, 2010.

[57]

J. Venna, J. Peltonen, K. Nybo, H. Aidos, and S. Kaski. Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11(Feb):451-490, 2010.

[58]

M. Vladymyrov and M.Á. Carreira-Perpiñán. Partial-Hessian strategies for fast learning of nonlinear embeddings. In Proceedings of the International Conference on Machine Learning, pages 345-352, 2012.

[59]

M. Vladymyrov and M.Á. Carreira-Perpiñán. Entropic affinities: Properties and efficient numerical computation. Proceedings of the International Conference on Machine Learning, JMLR W&CP, 28(3):477-485, 2013.

[60]

M. Vladymyrov and M.A. Carreira-Perpiñán. Linear-time training of nonlinear low-dimensional embeddings. In Proceedings of the International Conference on Artificial Intelligence and Statistics. JMLR: W&CP, volume 33, pages 968-977, 2014.

[61]

X. Wan and G.E. Karniadakis. A sharp error estimate for the fast gauss transform. Journal of Computational Physics, 219(1):7-12, 2006.

[62]

M.S. Warren and J.K. Salmon. A parallel hashed octtree N-body algorithm. In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 12-21, 1993.

[63]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Advances in Neural Information Processing Systems, pages 1753-1760, 2008.

[64]

C. Yang, R. Duraiswami, N.A. Gumerov, and L. Davis. Improved fast Gauss transform and efficient kernel density estimation. In Proceedings of the IEEE International Conference on Computer Vision, pages 664-671, 2003.

[65]

Z. Yang, J. Peltonen, and S. Kaski. Scalable optimization of neighbor embedding for visualization. In Proc. of the Int. Conf. on Machine Learning, 2013.

[66]

P.N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 311-321, 1993.

[67]

G. Zoutendijk. Methods of Feasible Directions. Elsevier Publishing Company, Amsterdam, The Netherlands, 1960.

Cited By

Zheng ZWu CJin YJia XKiyavash NMooij J(2024)Trusted re-weighting for label distribution learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702875(4237-4249)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702875
Zhao SZhang JWu YLuo YNie ZSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)LangCellProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694600(61159-61185)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694600
Goswami MSzafer KChoudhry ACai YLi SDubrawski ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)MOMENTProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692712(16115-16152)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692712
Show More Cited By

Accelerating t-SNE using tree-based algorithms
1. Computing methodologies

Recommendations

Laplacian-based Cluster-Contractive t-SNE for High-Dimensional Data Visualization
Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces to extract hidden and useful information or facilitate visual understanding and interpretation of the data. However, few of them take into ...
Theoretical foundations of t-SNE for visualizing high-dimensional clustered data

This paper investigates the theoretical foundations of the t-distributed stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based ...
DT-SNE: t-SNE discrete visualizations as decision tree structures
Abstract
Visualizations are powerful tools that are commonly used by data scientists to get more insights about their high dimensional data. One can for example cite t-SNE, which is probably one of the most famous and widely-used visualization techniques. ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 15, Issue 1

January 2014

4085 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2014

Published in JMLR Volume 15, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

265
Total Citations
View Citations
1,901
Total Downloads

Downloads (Last 12 months)241
Downloads (Last 6 weeks)20

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zheng ZWu CJin YJia XKiyavash NMooij J(2024)Trusted re-weighting for label distribution learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702875(4237-4249)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702875
Zhao SZhang JWu YLuo YNie ZSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)LangCellProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694600(61159-61185)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694600
Goswami MSzafer KChoudhry ACai YLi SDubrawski ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)MOMENTProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692712(16115-16152)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692712
Cao CZhong ZZhou ZLiu YLiu THan BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Envisioning outlier exposure by large language models for out-of-distribution detectionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692289(5629-5659)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692289
Hidri AMkhinini Gahar RSassi Hidri M(2024)What factors distinguish overlapping Data job postings? Towards ML-based models for job category’s factors predictionIntelligent Decision Technologies10.3233/IDT-24050918:3(2161-2176)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.3233/IDT-240509
Sun HQin XLiu X(2024)Learning hierarchical embedding space for image-text matchingIntelligent Data Analysis10.3233/IDA-23021428:3(647-665)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-230214
Wang JSu WHuang ZChen JLuo CLi JWooldridge MDy JNatarajan S(2024)Practical privacy-preserving MLaaSProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i14.29476(15502-15510)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i14.29476
Cuiying GWu YLi HYuan WJiang HHe QLiu YChristakis MPradel M(2024)Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus EnginesProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680302(553-565)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680302
Valentim IAntunes NLourenço NLi XHandl J(2024)Exploring Layerwise Adversarial Robustness Through the Lens of t-SNEProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654258(619-622)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654258
Skrodzki Mvan Geffen HChaves-de-Plaza NHöllt TEisemann EHildebrandt K(2024)Accelerating Hyperbolic t-SNEIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.336484130:7(4403-4415)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3364841
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents