Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3379597.3387446acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?

Published: 18 September 2020 Publication History

Abstract

In this paper, we address the problem of using sentiment analysis tools 'off-the-shelf', that is when a gold standard is not available for retraining. We evaluate the performance of four SE-specific tools in a cross-platform setting, i.e., on a test set collected from data sources different from the one used for training. We find that (i) the lexicon-based tools outperform the supervised approaches retrained in a cross-platform setting and (ii) retraining can be beneficial in within-platform settings in the presence of robust gold standard datasets, even using a minimal training set. Based on our empirical findings, we derive guidelines for reliable use of sentiment analysis tools in software engineering.

References

[1]
Amritanshu Agrawal and Tim Menzies. 2018. Is "Better Data" Better than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction (ICSE '18). ACM, New York, NY, USA, 1050--1061. https://doi.org/10.1145/3180155.3180197
[2]
T.Ahmed, A. Bosu, A. Iqbal, and S. Rahimi. 2017. SentiCR: A customized sentiment analysis tool for code review interactions. In 2017 32nd IEEE/ACM International Conf. on Automated Software Engineering (ASE). IEEE Press, 106--111. https://doi.org/10.1109/ASE.2017.8115623
[3]
John L. Austin. 1962. How to do things with words. Oxford University Press.
[4]
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning 79, 1 (2010), 151--175. https://doi.org/10.1007/s10994-009-5152-4
[5]
Cássio Castaldi Araujo Blaz and Karin Becker. 2016. Sentiment Analysis in Tickets for IT Support (MSR '16). ACM, New York, NY, USA, 235--246. https://doi.org/10.1145/2901739.2901781
[6]
Fabio Calefato, Filippo Lanubile, Federico Maiorano, and Nicole Novielli. 2018. Sentiment Polarity Detection for Software Development. Empirical Software Engineering 23, 3 (2018), 1352--1382. https://doi.org/10.1007/s10664-017-9546-9
[7]
Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2018. How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow. Information & Software Technology 94 (2018), 186--207. https://doi.org/10.1016/j.infsof.2017.10.009
[8]
Fabio Calefato, Filippo Lanubile, Nicole Novielli, and Luigi Quaranta. 2019. EMTk: The Emotion Mining Toolkit (SEmotion '19). IEEE Press, 34--37. https://doi.org/10.1109/SEmotion.2019.00014
[9]
Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso. 2018. Evalita 2018: Overview on the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. In Proc. of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conf. on Computational Linguistics (CLiC-it 2018), Turin, Italy, December 12-13, 2018. CEUR-SW.org. http://ceur-ws.org/Vol-2263/paper001.pdf
[10]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16 (2002), 321--357. https://doi.org/10.1613/jair.953
[11]
Zhenpeng Chen, Yanbin Cao, Xuan Lu, Qiaozhu Mei, and Xuanzhe Liu. 2019. SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 841--852. https://doi.org/10.1145/3338906.3338977
[12]
Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70, 4 (1968), 213. https://doi.org/10.1037/h0026256
[13]
Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik. 2017. Anger and Its Direction in Collaborative Software Development(ICSE-NIER '17). IEEE Press, 11--14. https://doi.org/10.1109/ICSE-NIER.2017.18
[14]
Emitza Guzman, Rana Alkadhi, and Norbert Seyff. 2016. A Needle in a Haystack: What Do Twitter Users Say about Software?. In 24th IEEE International Requirements Engineering Conf., RE 2016, Beijing, China, September 12-16, 2016. IEEE, 96--105. https://doi.org/10.1109/RE.2016.67
[15]
Emitza Guzman, David Azócar, and Yang Li. 2014. Sentiment Analysis of Commit Comments in GitHub: An Empirical Study (MSR 2014). ACM, New York, NY, USA, 352--355. https://doi.org/10.1145/2597073.2597118
[16]
H. He and E. A. Garcia. 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263--1284. https://doi.org/10.1109/TKDE.2008.239
[17]
Md Rakibul Islam and Minhaz F. Zibran. 2017. Leveraging Automated Sentiment Analysis in Software Engineering (MSR '17). IEEE Press, 203--214. https://doi.org/10.1109/MSR.2017.9
[18]
Md Rakibul Islam and Minhaz F. Zibran. 2018. DEVA: sensing emotions in the valence arousal space in software engineering text. In Proc. of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09-13, 2018. 1536--1543. https://doi.org/10.1145/3167132.3167296
[19]
Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik. 2017. On negative results when using sentiment analysis tools for software engineering research. Empirical Software Engineering 22, 5 (2017), 2543--2584. https://doi.org/10.1007/s10664-016-9493-x
[20]
Daniel Jurafsky and James H. Martin. 2009. Speech and Language Processing (2nd Edition). Prentice-Hall, Inc., USA.
[21]
Richard S. Lazarus. 1991. Emotion and adaptation / Richard S. Lazarus. Oxford University Press New York. xiii, 557 p.; pages. http://www.loc.gov/catdir/enhancements/fy0602/91009611-t.html
[22]
Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, and Michele Lanza. 2019. Pattern-Based Mining of Opinions in Q&A Websites (ICSE '19). IEEE Press, 548--559. https://doi.org/10.1109/ICSE.2019.00066
[23]
Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, Michele Lanza, and Rocco Oliveto. 2018. Sentiment Analysis for Software Engineering: How Far Can We Go? (ICSE 18). ACM, New York, NY, USA, 94--104. https://doi.org/10.1145/3180155.3180195
[24]
Walid Maalej, Zijad Kurtanoviundefined, Hadeer Nabil, and Christoph Stanik. 2016. On the Automatic Classification of App Reviews. Requir. Eng. 21, 3, 311--331. https://doi.org/10.1007/s00766-016-0251-9
[25]
Mika Mäntylä, Bram Adams, Giuseppe Destefanis, Daniel Graziotin, and Marco Ortu. 2016. Mining Valence, Arousal, and Dominance: Possibilities for Detecting Burnout and Productivity?(MSR 16). ACM, NewYork, NY, USA, 247--258. https://doi.org/10.1145/2901739.2901752
[26]
T. Menzies. 2020. The Five Laws of SE for AI. IEEE Software 37, 1 (Jan 2020), 81--85. https://doi.org/10.1109/MS.2019.2954841
[27]
Alessandro Murgia, Parastou Tourani, Bram Adams, and Marco Ortu. 2014. Do Developers Feel Emotions? An Exploratory Analysis of Emotions in Software Artifacts (MSR 2014). ACM, New York, NY, USA, 262--271. https://doi.org/10.1145/2597073.2597086
[28]
Nicole Novielli, Andrew Begel, and Walid Maalej. 2019. Introduction to the special issue on affect awareness in software engineering. Journal of Systems and Software 148 (2019), 180--182. https://doi.org/10.1016/jjss.2018.11.016
[29]
Nicole Novielli, Fabio Calefato, and Filippo Lanubile. 2015. The Challenges of Sentiment Detection in the Social Programmer Ecosystem (SSE 2015). ACM, New York, NY, USA, 33--40. https://doi.org/10.1145/2804381.2804387
[30]
Nicole Novielli, Daniela Girardi, and Filippo Lanubile. 2018. A Benchmark Study on Sentiment Analysis for Software Engineering Research (MSR '18). ACM, New York, NY, USA, 364--375. https://doi.org/10.1145/3196398.3196403
[31]
N. Novielli and A. Serebrenik. 2019. Sentiment and Emotion in Software Engineering. IEEE Software 36, 5 (2019), 6--23. https://doi.org/10.1109/MS.2019.2924013
[32]
Marco Ortu, Bram Adams, Giuseppe Destefanis, Parastou Tourani, Michele Marchesi, and Roberto Tonelli. 2015. Are Bullies More Productive? Empirical Study of Affectiveness vs. Issue Fixing Time (MSR '15). IEEE Press, 303--313.
[33]
Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, Roberto Tonelli, Michele Marchesi, and Bram Adams. 2016. The Emotional Side of Software Developers in JIRA (MSR '16). ACM, New York, NY, USA, 480--483. https://doi.org/10.1145/2901739.2903505
[34]
Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, 1-2 (2008), 1--135. https://doi.org/10.1561/1500000011
[35]
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How Can i Improve My App? Classifying User Reviews for Software Maintenance and Evolution. In Proc. of the 2015 IEEE International Conf. on Software Maintenance and Evolution (ICSME) (ICSME 15). IEEE Computer Society, USA, 281--290. https://doi.org/10.1109/ICSM.2015.7332474
[36]
Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and Emotion: Sentiment Analysis of Security Discussions on GitHub (MSR 2014). ACM, New York, NY, USA, 348--351. https://doi.org/10.1145/2597073.2597117
[37]
Ellen Riloff, Siddharth Patwardhan, and Janyce Wiebe. 2006. Feature Subsumption for Opinion Analysis (EMNLP 06). ACL, USA, 440--448.
[38]
Sebastian Ruder and Barbara Plank. 2018. Strong Baselines for Neural Semi-Supervised Learning under Domain Shift. In Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). ACL, 1044--1054. https://doi.org/10.18653/v1/P18-1096
[39]
J.A. Russell. 1980. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161--1178.
[40]
Klaus R. Scherer, Tanja Wranik, Janique Sangsue, Véronique Tran, and Ursula Scherer. 2004. Emotions in everyday life: probability of occurrence, risk factors, appraisal and reaction patterns. Social Science Information 43, 4 (2004), 499--570. https://doi.org/10.1177/0539018404047701
[41]
Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (2002), 1--47. https://doi.org/10.1145/505282.505283
[42]
Phillip Shaver, Judith Schwartz, Donald Kirson, and O'Connor Cary. 1987. Emotion knowledge: Further exploration of a prototype approach. Journal of Personality and Social Psychology 52, 6 (1987), 1061--1086. https://doi.org/10.1037/0022-3514.52.6.1061
[43]
Vinayak Sinha, Alina Lazar, and Bonita Sharif. 2016. Analyzing Developer Sentiment in Commit Logs (MSR 16). ACM, New York, NY, USA, 520--523. https://doi.org/10.1145/2901739.2903501
[44]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing. ACL, Seattle, Washington, USA, 1631--1642. https://www.aclweb.org/anthology/D13-1170
[45]
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-Based Methods for Sentiment Analysis. Comput. Linguist. 37, 2 (June 2011), 267--307. https://doi.org/10.1162/COLI_a_00049
[46]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, Akinori Ihara, and Kenichi Matsumoto. 2015. The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models (ICSE '15). IEEE Press, 812--823.
[47]
Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. 2010. Sentiment Strength Detection in Short Informal Text. J. Am. Soc. Inf. Sci. Technol. 61, 12 (Dec. 2010), 2544--2558.
[48]
Gias Uddin and Foutse Khomh. 2017. Opiner: An Opinion Search and Summarization Engine for APIs (ASE 2017). IEEE Press, 978--983.
[49]
Anthony Viera and Joanne Garrett. 2005. Understanding Interobserver Agreement: The Kappa Statistic. Family medicine 37 (06 2005), 360--3.
[50]
Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep Learning for Sentiment Analysis: A Survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (01 2018). https://doi.org/10.1002/widm.1253

Cited By

View all
  • (2024)"Looks Good To Me ;-)": Assessing Sentiment Analysis Tools for Pull Request DiscussionsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661189(211-221)Online publication date: 18-Jun-2024
  • (2024)Sentiment of Technical Debt Security Questions on Stack Overflow: A Replication Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00089(821-829)Online publication date: 12-Mar-2024
  • (2024)An Empirical Evaluation of the Zero-Shot, Few-Shot, and Traditional Fine-Tuning Based Pretrained Language Models for Sentiment Analysis in Software EngineeringIEEE Access10.1109/ACCESS.2024.343945012(109714-109734)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories
June 2020
675 pages
ISBN:9781450375177
DOI:10.1145/3379597
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NLP
  2. Sentiment analysis
  3. empirical software engineering
  4. human factors
  5. machine learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MSR '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)"Looks Good To Me ;-)": Assessing Sentiment Analysis Tools for Pull Request DiscussionsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661189(211-221)Online publication date: 18-Jun-2024
  • (2024)Sentiment of Technical Debt Security Questions on Stack Overflow: A Replication Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00089(821-829)Online publication date: 12-Mar-2024
  • (2024)An Empirical Evaluation of the Zero-Shot, Few-Shot, and Traditional Fine-Tuning Based Pretrained Language Models for Sentiment Analysis in Software EngineeringIEEE Access10.1109/ACCESS.2024.343945012(109714-109734)Online publication date: 2024
  • (2024)Sentiment Analysis of Finnish Twitter Discussions on COVID-19 During the PandemicSN Computer Science10.1007/s42979-023-02595-25:2Online publication date: 13-Feb-2024
  • (2024)Transformers and meta-tokenization in sentiment analysis for software engineeringEmpirical Software Engineering10.1007/s10664-024-10468-229:4Online publication date: 3-Jun-2024
  • (2024)What is Needed to Apply Sentiment Analysis in Real Software Projects: A Feasibility Study in IndustryHuman-Centered Software Engineering10.1007/978-3-031-64576-1_6(105-129)Online publication date: 1-Jul-2024
  • (2024)Exploring the Automatic Classification of Usage Information in FeedbackRequirements Engineering: Foundation for Software Quality10.1007/978-3-031-57327-9_17(267-283)Online publication date: 30-Mar-2024
  • (2023)Analytics Dashboards and User Behavior: Evidence from GitHub2023 46th MIPRO ICT and Electronics Convention (MIPRO)10.23919/MIPRO57284.2023.10159843(56-61)Online publication date: 22-May-2023
  • (2023)Cross-lingual Sentiment Analysis of Tamil Language Using a Multi-stage Deep Learning ArchitectureACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363139122:12(1-28)Online publication date: 1-Nov-2023
  • (2023)A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning ClassifiersACM Transactions on Software Engineering and Methodology10.1145/358356132:4(1-30)Online publication date: 27-May-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media