Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3038912.3052611acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

Published: 03 April 2017 Publication History


This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse - but still acceptable - performance when compared to the single language model, while benefiting from better generalization properties across languages.


M. Araujo, J. Reis, A. Pereira, and F. Benevenuto. An evaluation of machine translation for multilingual sentence-level sentiment analysis. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, pages 1140--1145, 2016.
A. Balahur and M. Turchi. Multilingual sentiment analysis using machine translation? In Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, 2012.
J. Bergstra, O. Breuleux, F. F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math compiler in Python. Proceedings of the Python for Scientific Computing Conference (SciPy), pages 1--7, 2010.
E. Boiy and M.-F. Moens. A machine learning approach to sentiment analysis in multilingual web texts. Information retrieval, 12(5):526--558, 2009.
S. Chetlur and C. Woolley. cuDNN: Efficient Primitives for Deep Learning. arXiv preprint, pages 1--9, 2014.
M. Cieliebak, O. Dürr, and F. Uzdilli. Potential and limitations of commercial sentiment detection tools. In ESSEM@ AI* IA, pages 47--58, 2013.
K. Dashtipour, S. Poria, A. Hussain, E. Cambria, A. Y. A. Hawalah, A. Gelbukh, and Q. Zhou. Multilingual sentiment analysis: State of the art and independent comparison of techniques. Cognitive Computation, 2016.
DEFT. https://deft.limsi.fr/2015/, 2015.
J. Deriu, M. Gonzenbach, F. Uzdilli, A. Lucchi, V. De Luca, and M. Jaggi. Swisscheese at semeval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision. Proceedings of SemEval, pages 1124--1128, 2016.
C. N. dos Santos and M. Gatti. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. In COLING, pages 69--78, 2014.
D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11:625--660, 2010.
J. Gao, X. He, W.-t. Yih, and L. Deng. Learning continuous phrase representations for translation modeling. In ACL, 2014.
A. Go, R. Bhayani, and L. Huang. Twitter Sentiment Classification using Distant Supervision. Technical report, The Stanford Natural Language Processing Group, 2009.
S. Gouws, Y. Bengio, and G. Corrado. Bilbowa: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
IWS. http://www.internetworldstats.com/stats7.htm, 2016.
R. Johnson and T. Zhang. Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding. In Advances in Neural Information Processing Systems 28, pages 919--927, 2015.
N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A Convolutional Neural Network for Modelling Sentences. In ACL - Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 655--665, 2014.
Y. Kim. Convolutional Neural Networks for Sentence Classification. In EMNLP 2014 - Empirical Methods in Natural Language Processing, pages 1746--1751, 2014.
M. Lui and T. Baldwin. Cross-domain feature selection for language identification. In In Proceedings of 5th International Joint Conference on Natural Language Processing, 2011.
M. Lui and T. Baldwin. langid. py: An off-the-shelf language identification tool. In Proceedings of the ACL 2012 system demonstrations, pages 25--30. Association for Computational Linguistics, 2012.
Mashable. http://mashable.com/2013/12/17/twitter-popular-languages/#gaFEjWnHPkql, 2013.
R. Mihalcea, C. Banea, and J. M. Wiebe. Learning multilingual subjective language via cross-lingual projections. In ACL, 2007.
V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, pages 807--814, 2010.
P. Nakov, A. Ritter, S. Rosenthal, V. Stoyanov, and F. Sebastiani. SemEval-2016 task 4: Sentiment analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval '16, June 2016.
S. Narr, M. Hulfenhaus, and S. Albayrak. Language-independent twitter sentiment analysis. Knowledge Discovery and Machine Learning (KDML), LWA, pages 12--14, 2012.
J. Read. Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In Proceedings of the ACL student research workshop, pages 43--48. Association for Computational Linguistics, 2005.
F. N. Ribeiro, M. Araújo, P. Gonçalves, F. Benevenuto, and M. A. Gonçalves. A benchmark comparison of state-of-the-practice sentiment analysis methods. arXiv preprint arXiv:1512.01818, 2015.
S. Rothe, S. Ebert, and H. Schutze. Ultradense Word Embeddings by Orthogonal Transformation. arXiv, 2016.
S. Semeniuta, A. Severyn, and E. Barth. Recurrent Dropout without Memory Loss. arXiv, 2016.
sentipolc. http://www.di.unito.it/ tutreeb/sentipolc-evalita16/data.html, 2016.
A. Severyn and A. Moschitti. Twitter Sentiment Analysis with Deep Convolutional Neural Networks. In 38th International ACM SIGIR Conference, pages 959--962, 2015.
A. Severyn and A. Moschitti. UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification. In SemEval 2015 - Proceedings of the 9th International Workshop on Semantic Evaluation, 2015.
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pages 101--110, 2014.
R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing, volume 1631, page 1642, 2013.
M. Wick, P. Kanani, and A. Pocock. Minimally-constrained multilingual embeddings via artificial code-switching. 2015.
M. D. Zeiler. ADADELTA: An Adaptive Learning Rate Method. arXiv, 2012.

Cited By

View all
  • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
  • (2024)Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based TechniquesElectronics10.3390/electronics1307130513:7(1305)Online publication date: 31-Mar-2024
  • (2024)Analyzing Cross-Lingual Approaches: a Case Study for Detecting Multilingual Hope Expressions in YouTube CommentsPattern Recognition and Image Analysis10.1134/S105466182470072X34:3(831-843)Online publication date: 17-Oct-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Other conferences
WWW '17: Proceedings of the 26th International Conference on World Wide Web
April 2017
1678 pages


  • IW3C2: International World Wide Web Conference Committee



International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 03 April 2017


Request permissions for this article.

Check for updates

Author Tags

  1. multi-language
  2. neural networks
  3. sentiment classification
  4. weak supervision


  • Research-article

Funding Sources

  • SpinningBytes
  • Swiss Commission for Technology and Innovation (CTI)


WWW '17
  • IW3C2

Acceptance Rates

WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 11 Jan 2025

Other Metrics


Cited By

View all
  • (2024)Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing ApproachesSensors10.3390/s2402034824:2(348)Online publication date: 6-Jan-2024
  • (2024)Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based TechniquesElectronics10.3390/electronics1307130513:7(1305)Online publication date: 31-Mar-2024
  • (2024)Analyzing Cross-Lingual Approaches: a Case Study for Detecting Multilingual Hope Expressions in YouTube CommentsPattern Recognition and Image Analysis10.1134/S105466182470072X34:3(831-843)Online publication date: 17-Oct-2024
  • (2024)Reinforcement learning in sentiment analysis: a review and future directionsArtificial Intelligence Review10.1007/s10462-024-10967-058:1Online publication date: 7-Nov-2024
  • (2023)Emotion Recognition Based on the Structure of NarrativesElectronics10.3390/electronics1204091912:4(919)Online publication date: 11-Feb-2023
  • (2023)Toward Label-Efficient Emotion and Sentiment AnalysisProceedings of the IEEE10.1109/JPROC.2023.3309299111:10(1159-1197)Online publication date: Oct-2023
  • (2023)Challenges and Issues in Sentiment Analysis: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2023.329304111(69626-69642)Online publication date: 2023
  • (2023)Multilingual Sentiment Analysis for Under-Resourced Languages: A Systematic Review of the LandscapeIEEE Access10.1109/ACCESS.2022.322413611(15996-16020)Online publication date: 2023
  • (2023)Machine learning and deep learning for sentiment analysis across languages: A surveyNeurocomputing10.1016/j.neucom.2023.02.015531(195-216)Online publication date: Apr-2023
  • (2023)TETKnowledge-Based Systems10.1016/j.knosys.2022.110236262:COnline publication date: 28-Feb-2023
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media