Article

On deep multi-view representation learning

Authors:

Jeff BilmesAuthors Info & Claims

ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

Pages 1083 - 1092

Published: 06 July 2015 Publication History

Abstract

We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for representation learning while only one view is available at test time. Previous work on this problem has proposed several techniques based on deep neural networks, typically involving either autoencoder-like networks with a reconstruction objective or paired feedforward networks with a correlation-based objective. We analyze several techniques based on prior work, as well as new variants, and compare them experimentally on visual, speech, and language domains. To our knowledge this is the first head-to-head comparison of a variety of such techniques on multiple tasks. We find an advantage for correlation-based representation learning, while the best results on most tasks are obtained with our new variant, deep canonically correlated autoencoders (DCCAE).

References

[1]

Akaho, Shotaro. A kernel method for canonical correlation analysis. In Proceedings of the International Meeting of the Psychometric Society (IMPS2001), 2001.

[2]

Andrew, Galen, Arora, Raman, Bilmes, Jeff, and Livescu, Karen. Deep canonical correlation analysis. In ICML, pp. 1247-1255, 2013.

[3]

Arora, Raman and Livescu, Karen. Kernel CCA for multi-view learning of acoustic features using articulatory measurements. In Symposium on Machine Learning in Speech and Language Processing (MLSLP), 2012.

[4]

Arora, Raman and Livescu, Karen. Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains. In ICASSP, 2013.

[5]

Bach, Francis R. and Jordan, Michael I. Kernel independent component analysis. Journal of Machine Learning Research, 3:1-48, 2002.

[6]

Becker, Suzanna and Hinton, Geoffrey E. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355:161-163, 1992.

[7]

Bickel, Steffen and Scheffer, Tobias. Multi-view clustering. In Proc. of the 4th IEEE Int. Conf. Data Mining (ICDM'04), pp. 19-26, 2004.

[8]

Bie, Tijl De and Moor, Bart De. On the regularization of canonical correlation analysis. Int. Sympos. ICA and BSS, 2003.

[9]

Blacoe, William and Lapata, Mirella. A comparison of vector-based representations for semantic composition. In EMNLP, pp. 546-556, 2012.

[10]

Blaschko, Mathew B. and Lampert, Christoph H. Correlational spectral clustering. In CVPR, pp. 1-8, 2008.

[11]

Borga, Magnus. Canonical correlation: A tutorial. 2001.

[12]

Cai, Deng, He, Xiaofei, and Han, Jiawei. Document clustering using Locality Preserving Indexing. IEEE Trans. Knowledge and Data Engineering, 17(12):1624-1637, 2005.

[13]

Chandar, Sarath, Lauly, Stanislas, Larochelle, Hugo, Khapra, Mitesh M., Ravindran, Balaraman, Raykar, Vikas, and Saha, Amrita. An autoencoder approach to learning bilingual word representations. In NIPS, pp. 1853-1861, 2014.

[14]

Chang, Chih-Chung and Lin, Chih-Jen. LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems and Technology, 2(3):27, 2011.

[15]

Chaudhuri, Kamalika, Kakade, Sham M., Livescu, Karen, and Sridharan, Karthik. Multi-view clustering via canonical correlation analysis. In ICML, pp. 129-136, 2009.

[16]

Chechik, Gal, Globerson, Amir, Tishby, Naftali, and Weiss, Yair. Information bottleneck for Gaussian variables. Journal of Machine Learning Research, 6:165-188, 2005.

[17]

Dhillon, Paramveer, Foster, Dean, and Ungar, Lyle. Multiview learning of word embeddings via CCA. In NIPS, pp. 199-207, 2011.

[18]

Faruqui, Manaal and Dyer, Chris. Improving vector space word representations using multilingual correlation. In Proceedings of European Chapter of the Association for Computational Linguistics, 2014.

[19]

Foster, Dean P., Johnson, Rie, Kakade, Sham M., and Zhang, Tong. Multi-view dimensionality reduction via canonical correlation analysis. Technical report, 2009.

[20]

Gutmann, Michael and Hyvärinen, Aapo. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 13:307-361, 2012.

[21]

Haghighi, Aria, Liang, Percy, Berg-Kirkpatrick, Taylor, and Klein, Dan. Learning bilingual lexicons from monolingual corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2008), pp. 771-779, 2008.

[22]

Hardoon, David R., Szedmak, Sandor, and Shawe-Taylor, John. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639-2664, 2004.

[23]

Hermansky, Hynek, Ellis, Daniel P. W., and Sharma, Sangita. Tandem connectionist feature extraction for conventional HMM systems. In ICASSP, pp. 1635-1638, 2000.

[24]

Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, 313 (5786):504-507, 2006.

[25]

Hodosh, Micah, Young, Peter, and Hockenmaier, Julia. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853-899, 2013.

[26]

Hotelling, Harold. Relations between two sets of variates. Biometrika, 28(3/4):321-377, 1936.

[27]

Hsieh, W. W. Nonlinear canonical correlation analysis by neural networks. Neural Networks, 13(10):1095-1105, 2000.

[28]

Huang, Po-Sen, Avron, Haim, Sainath, Tara, Sindhwani, Vikas, and Ramabhadran, Bhuvana. Kernel methods match deep neural networks on TIMIT: Scalable learning in high-dimensional random Fourier spaces. In ICASSP, pp. 205-209, 2014.

[29]

Kakade, Sham M. and Foster, Dean P. Multi-view regression via canonical correlation analysis. In COLT, pp. 82-96, 2007.

[30]

Kidron, Einat, Schechner, Yoav Y., and Elad, Michael. Pixels that sound. In CVPR, pp. 88-95, 2005.

[31]

Kim, Jungi, Nam, Jinseok, and Gurevych, Iryna. Learning semantics with deep belief network for cross-language information retrieval. In COLING, pp. 579-588, 2012.

[32]

Lai, Pei Ling and Fyfe, Colin. A neural implementation of canonical correlation analysis. Neural Networks, 12(10): 1391-1397, 1999.

[33]

Lai, Pei Ling and Fyfe, Colin. Kernel and nonlinear canonical correlation analysis. Int. J. Neural Syst., 10(5):365-377, 2000.

[34]

LeCun, Yann, Bottou, Léon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278-2324, 1998.

[35]

Lopez-Paz, David, Sra, Suvrit, Smola, Alex, Ghahramani, Zoubin, and Schoelkopf, Bernhard. Randomized nonlinear component analysis. In ICML, pp. 1359-1367, 2014.

[36]

Lu, Ang, Wang, Weiran, Bansal, Mohit, Gimpel, Kevin, and Livescu, Karen. Deep multilingual correlation for improved word embeddings. In NAACL-HLT, 2015.

[37]

Lu, Yichao and Foster, Dean P. Large scale canonical correlation analysis with iterative least squares. In NIPS, pp. 91-99, 2014.

[38]

Melzer, Thomas, Reiter,Michael, and Bischof, Horst. Nonlinear feature extraction using generalized canonical correlation analysis. In Proc. of the 11th Int. Conf. Artificial Neural Networks (ICANN'01), pp. 353-360, 2001.

[39]

Mitchell, Jeff and Lapata, Mirella. Composition in distributional models of semantics. Cognitive Science, 34(8): 1388-1429, 2010.

[40]

Ng, Andrew Y., Jordan, Michael I., and Weiss, Yair. On spectral clustering: Analysis and an algorithm. In NIPS, pp. 849-856, 2002.

[41]

Ngiam, Jiquan, Khosla, Aditya, Kim, Mingyu, Nam, Juhan, Lee, Honglak, and Ng, Andrew. Multimodal deep learning. In ICML, pp. 689-696, 2011.

[42]

Pennington, Jeffrey, Socher, Richard, and Manning, Christopher D. GloVe: Global vectors for word representation. In EMNLP, 2014.

[43]

Roweis, Sam T. and Saul, Lawrence K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000.

[44]

Schölkopf, Bernhard and Smola, Alexander J. Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.

[45]

Socher, Richard and Li, Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, pp. 966-973, 2010.

[46]

Sohn, Kihyuk, Shang, Wenling, and Lee, Honglak. Improved multimodal deep learning with variation of information. In NIPS, pp. 2141-2149, 2014.

[47]

Srivastava, Nitish and Salakhutdinov, Ruslan. Multimodal learning with deep boltzmann machines. Journal of Machine Learning Research, 15:2949-2980, 2014.

[48]

Tishby, Naftali, Pereira, Fernando, and Bialek, William. The information bottleneck method. In Proc. 37th Annual Allerton Conference on Communication, Control, and Computing, pp. 368-377, 1999.

[49]

van der Maaten, Laurens J. P. and Hinton, Geoffrey E. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008.

[50]

Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11:3371-3408, 2010.

[51]

Vinokourov, Alexei, Cristianini, Nello, and Shawe-Taylor, John. Inferring a semantic representation of text via cross-language correlation analysis. In NIPS, pp. 1497- 1504, 2003.

[52]

Wang, Weiran, Arora, Raman, Livescu, Karen, and Bilmes, Jeff. Unsupervised learning of acoustic features via deep canonical correlation analysis. In ICASSP, 2015.

[53]

Westbury, John R. X-Ray Microbeam Speech Production Database User's Handbook Version 1.0, 1994.

[54]

Williams, Christopher K. I. and Seeger, Matthias. Using the Nyström method to speed up kernel machines. In NIPS, pp. 682-688, 2001.

[55]

Yang, Tianbao, Li, Yu-Feng, Mahdavi, Mehrdad, Jin, Rong, and Zhou, Zhi-Hua. Nyström method vs random Fourier features: A theoretical and empirical comparison. In NIPS, pp. 476-484, 2012.

Cited By

Liang PZadeh AMorency L(2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3656580
Tsur DGoldfeld ZGreenewald KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Max-sliced mutual informationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669644(80338-80351)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669644
Huang WYang SCai HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Generalized information-theoretic multi-view clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668684(58752-58764)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668684
Show More Cited By

On deep multi-view representation learning
1. Computing methodologies

Recommendations

Multimodal correlation deep belief networks for multi-view classification

The Restricted Boltzmann machine (RBM) has been proven to be a powerful tool in many specific applications, such as representational learning, document modeling, and many other learning tasks. However, the extensions of the RBM are rarely used in the ...
Deep Sparse Representation Learning for Multi-class Image Classification
Pattern Recognition and Machine Intelligence
Abstract
This paper presents a novel deep sparse representation learning for multi-class image classification (DSRLMCC). In our proposed DSRLMCC, we use dictionary learning for sparse representation to train the deep convolutional layers to work as coding ...
Progressive deep multi-view comprehensive representation learning
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

Multi-view Comprehensive Representation Learning (MCRL) aims to synthesize information from multiple views to learn comprehensive representations of data items. Prevalent deep MCRL methods typically concatenate synergistic view-specific representations ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

July 2015

2558 pages

Editors:
Francis Bach,
David Blei

Publisher

JMLR.org

Publication History

Published: 06 July 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liang PZadeh AMorency L(2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3656580
Tsur DGoldfeld ZGreenewald KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Max-sliced mutual informationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669644(80338-80351)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669644
Huang WYang SCai HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Generalized information-theoretic multi-view clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668684(58752-58764)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668684
Wang YLi YCui ZOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Incomplete multimodality-diffused emotion recognitionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666870(17117-17128)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666870
Liu TBerrevoets JQian ZVan Der Schaar MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Learning representations without compositional assumptionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619291(21388-21403)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619291
Li HLi YYang MHu PPeng DPeng XElkind E(2023)Incomplete multi-view clustering via prototype-based imputationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/435(3911-3919)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/435
Xu CLi ZGuan ZZhao WSong XWu YLi JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Unbalanced Multi-view Deep LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612527(3051-3059)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612527
Wang JFeng SLyu GGu ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Triple-Granularity Contrastive Learning for Deep Multi-View Subspace ClusteringProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611844(2994-3002)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611844
Zhang JYu YTang SWu JLi W(2023)Variational Autoencoder with CCA for Audio–Visual Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357565819:3s(1-21)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.1145/3575658
Gong YYi JChen DZhang JZhou JZhou ZShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Inferring the Importance of Product Appearance with Semi-supervised Multi-modal EnhancementProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481538(1120-1128)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3481538
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents