Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3045118.3045234guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On deep multi-view representation learning

Published: 06 July 2015 Publication History

Abstract

We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for representation learning while only one view is available at test time. Previous work on this problem has proposed several techniques based on deep neural networks, typically involving either autoencoder-like networks with a reconstruction objective or paired feedforward networks with a correlation-based objective. We analyze several techniques based on prior work, as well as new variants, and compare them experimentally on visual, speech, and language domains. To our knowledge this is the first head-to-head comparison of a variety of such techniques on multiple tasks. We find an advantage for correlation-based representation learning, while the best results on most tasks are obtained with our new variant, deep canonically correlated autoencoders (DCCAE).

References

[1]
Akaho, Shotaro. A kernel method for canonical correlation analysis. In Proceedings of the International Meeting of the Psychometric Society (IMPS2001), 2001.
[2]
Andrew, Galen, Arora, Raman, Bilmes, Jeff, and Livescu, Karen. Deep canonical correlation analysis. In ICML, pp. 1247-1255, 2013.
[3]
Arora, Raman and Livescu, Karen. Kernel CCA for multi-view learning of acoustic features using articulatory measurements. In Symposium on Machine Learning in Speech and Language Processing (MLSLP), 2012.
[4]
Arora, Raman and Livescu, Karen. Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains. In ICASSP, 2013.
[5]
Bach, Francis R. and Jordan, Michael I. Kernel independent component analysis. Journal of Machine Learning Research, 3:1-48, 2002.
[6]
Becker, Suzanna and Hinton, Geoffrey E. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355:161-163, 1992.
[7]
Bickel, Steffen and Scheffer, Tobias. Multi-view clustering. In Proc. of the 4th IEEE Int. Conf. Data Mining (ICDM'04), pp. 19-26, 2004.
[8]
Bie, Tijl De and Moor, Bart De. On the regularization of canonical correlation analysis. Int. Sympos. ICA and BSS, 2003.
[9]
Blacoe, William and Lapata, Mirella. A comparison of vector-based representations for semantic composition. In EMNLP, pp. 546-556, 2012.
[10]
Blaschko, Mathew B. and Lampert, Christoph H. Correlational spectral clustering. In CVPR, pp. 1-8, 2008.
[11]
Borga, Magnus. Canonical correlation: A tutorial. 2001.
[12]
Cai, Deng, He, Xiaofei, and Han, Jiawei. Document clustering using Locality Preserving Indexing. IEEE Trans. Knowledge and Data Engineering, 17(12):1624-1637, 2005.
[13]
Chandar, Sarath, Lauly, Stanislas, Larochelle, Hugo, Khapra, Mitesh M., Ravindran, Balaraman, Raykar, Vikas, and Saha, Amrita. An autoencoder approach to learning bilingual word representations. In NIPS, pp. 1853-1861, 2014.
[14]
Chang, Chih-Chung and Lin, Chih-Jen. LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems and Technology, 2(3):27, 2011.
[15]
Chaudhuri, Kamalika, Kakade, Sham M., Livescu, Karen, and Sridharan, Karthik. Multi-view clustering via canonical correlation analysis. In ICML, pp. 129-136, 2009.
[16]
Chechik, Gal, Globerson, Amir, Tishby, Naftali, and Weiss, Yair. Information bottleneck for Gaussian variables. Journal of Machine Learning Research, 6:165-188, 2005.
[17]
Dhillon, Paramveer, Foster, Dean, and Ungar, Lyle. Multiview learning of word embeddings via CCA. In NIPS, pp. 199-207, 2011.
[18]
Faruqui, Manaal and Dyer, Chris. Improving vector space word representations using multilingual correlation. In Proceedings of European Chapter of the Association for Computational Linguistics, 2014.
[19]
Foster, Dean P., Johnson, Rie, Kakade, Sham M., and Zhang, Tong. Multi-view dimensionality reduction via canonical correlation analysis. Technical report, 2009.
[20]
Gutmann, Michael and Hyvärinen, Aapo. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 13:307-361, 2012.
[21]
Haghighi, Aria, Liang, Percy, Berg-Kirkpatrick, Taylor, and Klein, Dan. Learning bilingual lexicons from monolingual corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2008), pp. 771-779, 2008.
[22]
Hardoon, David R., Szedmak, Sandor, and Shawe-Taylor, John. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639-2664, 2004.
[23]
Hermansky, Hynek, Ellis, Daniel P. W., and Sharma, Sangita. Tandem connectionist feature extraction for conventional HMM systems. In ICASSP, pp. 1635-1638, 2000.
[24]
Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, 313 (5786):504-507, 2006.
[25]
Hodosh, Micah, Young, Peter, and Hockenmaier, Julia. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853-899, 2013.
[26]
Hotelling, Harold. Relations between two sets of variates. Biometrika, 28(3/4):321-377, 1936.
[27]
Hsieh, W. W. Nonlinear canonical correlation analysis by neural networks. Neural Networks, 13(10):1095-1105, 2000.
[28]
Huang, Po-Sen, Avron, Haim, Sainath, Tara, Sindhwani, Vikas, and Ramabhadran, Bhuvana. Kernel methods match deep neural networks on TIMIT: Scalable learning in high-dimensional random Fourier spaces. In ICASSP, pp. 205-209, 2014.
[29]
Kakade, Sham M. and Foster, Dean P. Multi-view regression via canonical correlation analysis. In COLT, pp. 82-96, 2007.
[30]
Kidron, Einat, Schechner, Yoav Y., and Elad, Michael. Pixels that sound. In CVPR, pp. 88-95, 2005.
[31]
Kim, Jungi, Nam, Jinseok, and Gurevych, Iryna. Learning semantics with deep belief network for cross-language information retrieval. In COLING, pp. 579-588, 2012.
[32]
Lai, Pei Ling and Fyfe, Colin. A neural implementation of canonical correlation analysis. Neural Networks, 12(10): 1391-1397, 1999.
[33]
Lai, Pei Ling and Fyfe, Colin. Kernel and nonlinear canonical correlation analysis. Int. J. Neural Syst., 10(5):365-377, 2000.
[34]
LeCun, Yann, Bottou, Léon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278-2324, 1998.
[35]
Lopez-Paz, David, Sra, Suvrit, Smola, Alex, Ghahramani, Zoubin, and Schoelkopf, Bernhard. Randomized nonlinear component analysis. In ICML, pp. 1359-1367, 2014.
[36]
Lu, Ang, Wang, Weiran, Bansal, Mohit, Gimpel, Kevin, and Livescu, Karen. Deep multilingual correlation for improved word embeddings. In NAACL-HLT, 2015.
[37]
Lu, Yichao and Foster, Dean P. Large scale canonical correlation analysis with iterative least squares. In NIPS, pp. 91-99, 2014.
[38]
Melzer, Thomas, Reiter,Michael, and Bischof, Horst. Nonlinear feature extraction using generalized canonical correlation analysis. In Proc. of the 11th Int. Conf. Artificial Neural Networks (ICANN'01), pp. 353-360, 2001.
[39]
Mitchell, Jeff and Lapata, Mirella. Composition in distributional models of semantics. Cognitive Science, 34(8): 1388-1429, 2010.
[40]
Ng, Andrew Y., Jordan, Michael I., and Weiss, Yair. On spectral clustering: Analysis and an algorithm. In NIPS, pp. 849-856, 2002.
[41]
Ngiam, Jiquan, Khosla, Aditya, Kim, Mingyu, Nam, Juhan, Lee, Honglak, and Ng, Andrew. Multimodal deep learning. In ICML, pp. 689-696, 2011.
[42]
Pennington, Jeffrey, Socher, Richard, and Manning, Christopher D. GloVe: Global vectors for word representation. In EMNLP, 2014.
[43]
Roweis, Sam T. and Saul, Lawrence K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000.
[44]
Schölkopf, Bernhard and Smola, Alexander J. Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.
[45]
Socher, Richard and Li, Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, pp. 966-973, 2010.
[46]
Sohn, Kihyuk, Shang, Wenling, and Lee, Honglak. Improved multimodal deep learning with variation of information. In NIPS, pp. 2141-2149, 2014.
[47]
Srivastava, Nitish and Salakhutdinov, Ruslan. Multimodal learning with deep boltzmann machines. Journal of Machine Learning Research, 15:2949-2980, 2014.
[48]
Tishby, Naftali, Pereira, Fernando, and Bialek, William. The information bottleneck method. In Proc. 37th Annual Allerton Conference on Communication, Control, and Computing, pp. 368-377, 1999.
[49]
van der Maaten, Laurens J. P. and Hinton, Geoffrey E. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008.
[50]
Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11:3371-3408, 2010.
[51]
Vinokourov, Alexei, Cristianini, Nello, and Shawe-Taylor, John. Inferring a semantic representation of text via cross-language correlation analysis. In NIPS, pp. 1497- 1504, 2003.
[52]
Wang, Weiran, Arora, Raman, Livescu, Karen, and Bilmes, Jeff. Unsupervised learning of acoustic features via deep canonical correlation analysis. In ICASSP, 2015.
[53]
Westbury, John R. X-Ray Microbeam Speech Production Database User's Handbook Version 1.0, 1994.
[54]
Williams, Christopher K. I. and Seeger, Matthias. Using the Nyström method to speed up kernel machines. In NIPS, pp. 682-688, 2001.
[55]
Yang, Tianbao, Li, Yu-Feng, Mahdavi, Mehrdad, Jin, Rong, and Zhou, Zhi-Hua. Nyström method vs random Fourier features: A theoretical and empirical comparison. In NIPS, pp. 476-484, 2012.

Cited By

View all
  • (2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
  • (2023)Max-sliced mutual informationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669644(80338-80351)Online publication date: 10-Dec-2023
  • (2023)Generalized information-theoretic multi-view clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668684(58752-58764)Online publication date: 10-Dec-2023
  • Show More Cited By
  1. On deep multi-view representation learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37
    July 2015
    2558 pages

    Publisher

    JMLR.org

    Publication History

    Published: 06 July 2015

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys10.1145/365658056:10(1-42)Online publication date: 22-Jun-2024
    • (2023)Max-sliced mutual informationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669644(80338-80351)Online publication date: 10-Dec-2023
    • (2023)Generalized information-theoretic multi-view clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668684(58752-58764)Online publication date: 10-Dec-2023
    • (2023)Incomplete multimodality-diffused emotion recognitionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666870(17117-17128)Online publication date: 10-Dec-2023
    • (2023)Learning representations without compositional assumptionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619291(21388-21403)Online publication date: 23-Jul-2023
    • (2023)Incomplete multi-view clustering via prototype-based imputationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/435(3911-3919)Online publication date: 19-Aug-2023
    • (2023)Unbalanced Multi-view Deep LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612527(3051-3059)Online publication date: 26-Oct-2023
    • (2023)Triple-Granularity Contrastive Learning for Deep Multi-View Subspace ClusteringProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611844(2994-3002)Online publication date: 26-Oct-2023
    • (2023)Variational Autoencoder with CCA for Audio–Visual Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357565819:3s(1-21)Online publication date: 24-Feb-2023
    • (2021)Inferring the Importance of Product Appearance with Semi-supervised Multi-modal EnhancementProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481538(1120-1128)Online publication date: 17-Oct-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media