research-article

Analyzing and Detecting Information Types of Developer Live Chat Threads

Authors:

He JiangAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 5

Article No.: 131, Pages 1 - 32

https://doi.org/10.1145/3643677

Published: 04 June 2024 Publication History

Abstract

Online chatrooms serve as vital platforms for information exchange among software developers. With multiple developers engaged in rapid communication and diverse conversation topics, the resulting chat messages often manifest complexity and lack structure. To enhance the efficiency of extracting information from chat threads, automatic mining techniques are introduced for thread classification. However, previous approaches still grapple with unsatisfactory classification accuracy due to two primary challenges that they struggle to adequately capture long-distance dependencies within chat threads and address the issue of category imbalance in labeled datasets. To surmount these challenges, we present a topic classification approach for chat information types named EAEChat. Specifically, EAEChat comprises three core components: the text feature encoding component captures contextual text features using a multi-head self-attention mechanism-based text feature encoder, and a siamese network is employed to mitigate overfitting caused by limited data; the data augmentation component expands a small number of categories in the training dataset using a technique tailored to developer chat messages, effectively tackling the challenge of imbalanced category distribution; the non-text feature encoding component employs a feature fusion model to integrate deep text features with manually extracted non-text features. Evaluation across three real-world projects demonstrates that EAEChat, respectively, achieves an average precision, recall, and F1-score of 0.653, 0.651, and 0.644, and it marks a significant 7.60% improvement over the state-of-the-art approaches. These findings confirm the effectiveness of our method in proficiently classifying developer chat messages in online chatrooms.

References

[1]

Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why developers are slacking off: Understanding how software teams use slack. In 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. 333–336.

Digital Library

[2]

Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and Alexey Zagalsky. 2014. The (r) evolution of social media in software engineering. Fut. Softw. Eng. Proc. (2014), 100–116.

Digital Library

[3]

Verena Käfer, Daniel Graziotin, Ivan Bogicevic, Stefan Wagner, and Jasmin Ramadani. 2018. Communication in open-source projects-end of the e-mail era? In 40th International Conference on Software Engineering. 242–243.

Digital Library

[4]

GitLab. 2014. Gitter. Retrieved from https://gitter.im/

[5]

Slack Technologies. 2013. Slack. Retrieved from https://slack.com/

[6]

Freenode. 1995. Freenode. Retrieved from https://freenode.net/

[7]

Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and Nicholas A. Kraft. 2019. Exploratory study of slack Q&A chats as a mining source for software engineering tools. In IEEE/ACM 16th International Conference on Mining Software Repositories (MSR’19). IEEE, 490–501.

Digital Library

[8]

Osama Ehsan, Safwat Hassan, Mariam El Mezouar, and Ying Zou. 2020. An empirical study of developer discussions in the Gitter platform. ACM Trans. Softw. Eng. Methodol. 30, 1 (2020), 1–39.

Digital Library

[9]

Lin Shi, Mingzhe Xing, Mingyang Li, Yawen Wang, Shoubin Li, and Qing Wang. 2020. Detection of hidden feature requests from massive chat messages via deep siamese network. In IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 641–653.

[10]

Hareem Sahar, Abram Hindle, and Cor-Paul Bezemer. 2021. How are issue reports discussed in Gitter chat rooms? J. Syst. Softw. 172 (2021), 110852.

[11]

Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge. 2018. How do developers discuss rationale? In IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). IEEE, 357–369.

[12]

Deeksha Arya, Wenting Wang, Jin L. C. Guo, and Jinghui Cheng. 2019. Analysis and detection of information types of open source software issue discussions. In IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 454–464.

Digital Library

[13]

Eduard C. Groen, Norbert Seyff, Raian Ali, Fabiano Dalpiaz, Joerg Doerr, Emitza Guzman, Mahmood Hosseini, Jordi Marco, Marc Oriol, Anna Perini, and Stade Melanie. 2017. The crowd in requirements engineering: The landscape and challenges. IEEE Softw. 34, 2 (2017), 44–52.

[14]

Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, and Walter S. Lasecki. 2018. A large-scale corpus for conversation disentanglement. arXiv preprint arXiv:1810.11118 (2018).

[15]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645–6649.

[16]

Shengyi Pan, Lingfeng Bao, Xiaoxue Ren, Xin Xia, David Lo, and Shanping Li. 2021. Automating developer chat mining. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 854–866.

Digital Library

[17]

Xinbei Ma, Zhuosheng Zhang, and Hai Zhao. 2022. Structural characterization for dialogue disentanglement. arXiv preprint arXiv:2110.08018 (2022).

[18]

Yuan Meng, Xuhao Pan, Jun Chang, and Yue Wang. 2023. RGAT: A deeper look into syntactic dependency information for coreference resolution. In International Joint Conference on Neural Networks (IJCNN’23). 1–8. DOI:

[19]

Tong Zhao, Junjie Peng, Yansong Huang, Lan Wang, Huiran Zhang, and Zesu Cai. 2023. A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis. Appl. Intell. (112023), 1–14. DOI:

Digital Library

[20]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computat. 9, 8 (1997), 1735–1780.

Digital Library

[21]

Shafiq Joty, Alberto Barrón-Cedeño, Giovanni Da San Martino, Simone Filice, Lluís Màrquez, Alessandro Moschitti, and Preslav Nakov. 2015. Global thread-level inference for comment classification in community question answering. In Conference on Empirical Methods in Natural Language Processing, Lluís Màrquez, Chris Callison-Burch, and Jian Su (Eds.). Association for Computational Linguistics, 573–578. DOI:

[22]

Ruoyao Yang, Wanying Xie, Chunhua Liu, and Dong Yu. 2019. BLCU_NLP at SemEval-2019 Task 7: An inference chain-based GPT model for rumour evaluation. In 13th International Workshop on Semantic Evaluation, Jonathan May, Ekaterina Shutova, Aurelie Herbelot, Xiaodan Zhu, Marianna Apidianaki, and Saif M. Mohammad (Eds.). Association for Computational Linguistics, 1090–1096. DOI:

[23]

Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock. 2021. Automatically identifying the quality of developer chats for post hoc use. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 48 (July2021), 28 pages. DOI:

Digital Library

[24]

Marwa Tolba, Salima Ouadfel, and Souham Meshoul. 2021. Hybrid ensemble approaches to online harassment detection in highly imbalanced data. Expert Syst. Appl. 175, C (Aug.2021), 13. DOI:

Digital Library

[25]

Jonathan Herzig, Guy Feigenblat, Michal Shmueli-Scheuer, David Konopnicki, and Anat Rafaeli. 2016. Predicting customer satisfaction in customer support conversations in social media using affective features. In Conference on User Modeling Adaptation and Personalization (UMAP’16). Association for Computing Machinery, New York, NY, 115–119. DOI:

Digital Library

[26]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[27]

Google. 2019. BERT-Small on HuggingFace. Retrieved from https://huggingface.co/google/bert_uncased_L-4_H-256_A-4

[28]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018).

[29]

Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE, 539–546.

Digital Library

[30]

Mohamed Chiny, Omar Bencharef, Moulay Youssef Hadi, and Younes Chihab. 2021. A client-centric evaluation system to evaluate guest’s satisfaction on AirBNB using machine learning and NLP. Appl. Computat. Intell. Soft Comput. 2021 (2021), 1–14.

[31]

Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. Adv. Neural Inf. Process. Syst. 33 (2020), 6256–6268.

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

[33]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://api.semanticscholar.org/CorpusID:49313245

[34]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. PMLR, 1188–1196.

Digital Library

[35]

Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, and Dan Jurafsky. 2022. Problems with cosine as a measure of embedding similarity for high frequency words. In 60th Annual Meeting of the Association for Computational Linguistics, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 401–423. DOI:

[36]

Zeming Dong, Qiang Hu, Yuejun Guo, Zhenya Zhang, Maxime Cordy, Mike Papadakis, Yves Le Traon, and Jianjun Zhao. 2023. Boosting source code learning with data augmentation: An empirical study. arXiv preprint arXiv:2303.06808 (2023).

[37]

Paige Rodeghero, Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Detecting user story information in developer-client conversations to generate extractive summaries. In IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 49–59.

Digital Library

[38]

Andrew Wood, Paige Rodeghero, Ameer Armaly, and Collin McMillan. 2018. Detecting speech act types in developer question/answer conversations during bug repair. In 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 491–502.

Digital Library

[39]

Sarah Rastkar, Gail C. Murphy, and Gabriel Murray. 2014. Automatic summarization of bug reports. IEEE Trans. Softw. Eng. 40, 4 (2014), 366–380.

Digital Library

[40]

Gitter. 2014. Angular Chatroom on Gitter. Retrieved from https://gitter.im/angular/angular

[41]

Gitter. 2014. Deeplearning4j chatroom on Gitter. Retrieved from https://gitter.im/eclipse/deeplearning4j

[42]

Gitter. 2014. Spring-boot chatroom on Gitter. Retrieved from https://gitter.im/spring-projects/spring-boot

[43]

Gitter. 2014. Gitter developer page. Retrieved from https://developer.gitter.im/

[44]

Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2015. Development emails content analyzer: Intention mining in developer discussions (T). In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE, 12–23.

Digital Library

[45]

Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In IEEE International Conference on Software Maintenance and Evolution (ICSME’15). IEEE, 281–290.

Digital Library

[46]

Donna Spencer. 2009. Card Sorting: Designing Usable Categories. Rosenfeld Media.

[47]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20, 1 (1960), 37–46.

[48]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. GBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30 (2017).

[49]

Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).

[50]

Qiao Huang, Xin Xia, David Lo, and Gail C. Murphy. 2018. Automating intention mining. IEEE Trans. Soft. Eng. 46, 10 (2018), 1098–1119.

[51]

Allen Institute for Artificial Intelligence. 2023. AllenNLP. Retrieved from https://allennlp.org/

[52]

Facebook. 2023. PyTorch. Retrieved from https://pytorch.org/

[53]

Hugging Face. 2020. Transformers. Retrieved from https://huggingface.co/

[54]

Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In International Conference on Software Engineering (ICSE’13). IEEE Press, 432–441.

[55]

Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13). 279–289. DOI:

Digital Library

[56]

Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In 35th International Conference on Software Engineering (ICSE’13). 382–391. DOI:

[57]

Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayse Basar Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. 17 (2010), 375–407. Retrieved from https://api.semanticscholar.org/CorpusID:2782280

Digital Library

[58]

Emad Shihab, Zhen Ming Jiang, and Ahmed E. Hassan. 2009. Studying the use of developer IRC meetings in open source projects. In IEEE International Conference on Software Maintenance. IEEE, 147–156.

[59]

Rana Alkadhi, Teodora Lata, Emitza Guzmany, and Bernd Bruegge. 2017. Rationale in development chat messages: An exploratory study. In IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’17). IEEE, 436–446.

Digital Library

[60]

Anna Glazkova. 2020. A comparison of synthetic oversampling methods for multi-class text classification. arXiv preprint arXiv:2008.04636 (2020).

[61]

Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019).

[62]

Claude Coulombe. 2018. Text data augmentation made simple by leveraging NLP cloud APIs. arXiv preprint arXiv:1812.04718 (2018).

[63]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357.

[64]

Shikai Guo, Jian Dong, Hui Li, and Jiahui Wang. 2021. Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique. J. Softw.: Evolut. Process 33, 7 (2021), e2362.

Digital Library

Index Terms

Analyzing and Detecting Information Types of Developer Live Chat Threads
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Collaboration in software development

Recommendations

Detecting speech act types in developer question/answer conversations during bug repair
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

This paper targets the problem of speech act detection in conversations about bug repair. We conduct a ``Wizard of Oz'' experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant ...
Deep Chit-Chat: Deep Learning for Chatbots
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

The tutorial is based on our long-term research on open domain conversation, rich hands-on experience on development of Microsoft XiaoIce, and our previous tutorials on EMNLP 2018 and the Web Conference 2019. It starts from a summary of recent ...
Deep Chit-Chat: Deep Learning for Chatbots
WWW '19: Companion Proceedings of The 2019 World Wide Web Conference

The tutorial is based on our long-term research on open domain conversation and rich hands-on experience on development of Microsoft XiaoIce. We will summarize the recent achievements made by both academia and industry on chatbots, and give a thorough ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 5

June 2024

952 pages

EISSN:1557-7392

DOI:10.1145/3618079

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2024

Online AM: 29 January 2024

Accepted: 15 January 2024

Revised: 11 January 2024

Received: 13 August 2023

Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Dalian Excellent Young Project
National Natural Science Foundation of China
Natural Science Foundation of Liaoning Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
498
Total Downloads

Downloads (Last 12 months)415
Downloads (Last 6 weeks)27

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents