Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1118693.1118704dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
Article
Free access

Thumbs up?: sentiment classification using machine learning techniques

Published: 06 July 2002 Publication History

Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

References

[1]
Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am I reading? In Proc. of the AAAI Workshop on Text Categorization, pages 1--4.
[2]
Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71.
[3]
Douglas Biber. 1988. Variation across Speech and Writing. Cambridge University Press.
[4]
Stanley Chen and Ronald Rosenfeld. 2000. A survey of smoothing techniques for ME models. IEEE Trans. Speech and Audio Processing, 8(1):37--50.
[5]
Sanjiv Das and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proc. of the 8th Asia Pacific Finance Association Annual Conference (APFA 2001).
[6]
Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380--393.
[7]
Pedro Domingos and Michael J. Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2--3):103--130.
[8]
Aidan Finn, Nicholas Kushmerick, and Barry Smyth. 2002. Genre classification and domain transfer for information filtering. In Proc. of the European Colloquium on Information Retrieval Research, pages 353--362, Glasgow.
[9]
Vasileios Hatzivassiloglou and Kathleen McKeown. 1997. Predicting the semantic orientation of adjectives. In Proc. of the 35th ACL/8th EACL, pages 174--181.
[10]
Vasileios Hatzivassiloglou and Janyce Wiebe. 2000. Effects of adjective orientation and gradability on sentence subjectivity. In Proc. of COLING.
[11]
Marti Hearst. 1992. Direction-based text interpretation as an information access refinement. In Paul Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates.
[12]
Alison Huettner and Pero Subasic. 2000. Fuzzy typing for document management. In ACL 2000 Companion Volume: Tutorial Abstracts and Demonstration Notes, pages 26--27.
[13]
Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning (ECML), pages 137--142.
[14]
Thorsten Joachims. 1999. Making large-scale SVM learning practical. In Bernhard Schölkopf and Alexander Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 44--56. MIT Press.
[15]
Jussi Karlgren and Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proc. of COLING.
[16]
Brett Kessler, Geoffrey Nunberg, and Hinrich Schütze. 1997. Automatic detection of text genre. In Proc. of the 35th ACL/8th EACL, pages 32--38.
[17]
David D. Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proc. of the European Conference on Machine Learning (ECML), pages 4--15. Invited talk.
[18]
Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, pages 41--48.
[19]
Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Springer-Verlag.
[20]
Kamal Nigam, John Lafferty, and Andrew McCallum. 1999. Using maximum entropy for text classification. In Proc. of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67.
[21]
Ted Pedersen. 2001. A decision tree of bigrams is an accurate predictor of word sense. In Proc. of the Second NAACL, pages 79--86.
[22]
Warren Sack. 1994. On the computation of point of view. In Proc. of the Twelfth AAAI, page 1488. Student abstract.
[23]
Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proc. of Innovative Applications of Artificial Intelligence (IAAI), pages 1058--1065.
[24]
Junichi Tatemura. 2000. Virtual reviewers for collaborative exploration of movie reviews. In Proc. of the 5th International Conference on Intelligent User Interfaces, pages 272--275.
[25]
Loren Terveen, Will Hill, Brian Amento, David McDonald, and Josh Creter. 1997. PHOAKS: A system for sharing recommendations. Communications of the ACM, 40(3):59--62.
[26]
Laura Mayfield Tomokiyo and Rosie Jones. 2001. You're not from round here, are you? Naive Bayes detection of non-native utterance text. In Proc. of the Second NAACL, pages 239--246.
[27]
Richard M. Tong. 2001. An operational system for detecting and tracking opinions in on-line discussion. Workshop note, SIGIR 2001 Workshop on Operational Text Classification.
[28]
Peter D. Turney and Michael L. Littman. 2002. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report EGB-1094, National Research Council Canada.
[29]
Peter Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proc. of the ACL.
[30]
Janyce M. Wiebe, Theresa Wilson, and Matthew Bell. 2001. Identifying collocations for recognizing opinions. In Proc. of the ACL/EACL Workshop on Collocation.
[31]
Yorick Wilks and Mark Stevenson. 1998. The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation. Journal of Natural Language Engineering, 4(2):135--144.

Cited By

View all
  • (2024)Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated ContentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681703(6433-6442)Online publication date: 28-Oct-2024
  • (2024)Transformer based multilingual joint learning framework for code-mixed and english sentiment analysisJournal of Intelligent Information Systems10.1007/s10844-023-00808-x62:1(231-253)Online publication date: 1-Feb-2024
  • (2024)Gender mismatch and bias in people‐centric operationsJournal of Operations Management10.1002/joom.124970:5(E1-E17)Online publication date: 17-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
July 2002
328 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 July 2002

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,158
  • Downloads (Last 6 weeks)285
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated ContentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681703(6433-6442)Online publication date: 28-Oct-2024
  • (2024)Transformer based multilingual joint learning framework for code-mixed and english sentiment analysisJournal of Intelligent Information Systems10.1007/s10844-023-00808-x62:1(231-253)Online publication date: 1-Feb-2024
  • (2024)Gender mismatch and bias in people‐centric operationsJournal of Operations Management10.1002/joom.124970:5(E1-E17)Online publication date: 17-Jul-2024
  • (2023)Locally invariant explanationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666975(19410-19445)Online publication date: 10-Dec-2023
  • (2023)TARTProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666549(9751-9788)Online publication date: 10-Dec-2023
  • (2023)Towards Enhanced Identification of Emotion from Resource-Constrained Language through a novel Multilingual BERT ApproachACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3592794Online publication date: 19-Apr-2023
  • (2023)Multimodal Sentiment Analysis: A Survey of Methods, Trends, and ChallengesACM Computing Surveys10.1145/358607555:13s(1-38)Online publication date: 13-Jul-2023
  • (2023)Image–Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358486122:6(1-30)Online publication date: 17-Feb-2023
  • (2023)A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text ClassificationACM Computing Surveys10.1145/358200055:13s(1-52)Online publication date: 13-Jul-2023
  • (2023)Fusion Pre-trained Emoji Feature Enhancement for Sentiment AnalysisACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357858222:4(1-14)Online publication date: 25-Mar-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media