Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Analyzing and Detecting Information Types of Developer Live Chat Threads

Published: 04 June 2024 Publication History

Abstract

Online chatrooms serve as vital platforms for information exchange among software developers. With multiple developers engaged in rapid communication and diverse conversation topics, the resulting chat messages often manifest complexity and lack structure. To enhance the efficiency of extracting information from chat threads, automatic mining techniques are introduced for thread classification. However, previous approaches still grapple with unsatisfactory classification accuracy due to two primary challenges that they struggle to adequately capture long-distance dependencies within chat threads and address the issue of category imbalance in labeled datasets. To surmount these challenges, we present a topic classification approach for chat information types named EAEChat. Specifically, EAEChat comprises three core components: the text feature encoding component captures contextual text features using a multi-head self-attention mechanism-based text feature encoder, and a siamese network is employed to mitigate overfitting caused by limited data; the data augmentation component expands a small number of categories in the training dataset using a technique tailored to developer chat messages, effectively tackling the challenge of imbalanced category distribution; the non-text feature encoding component employs a feature fusion model to integrate deep text features with manually extracted non-text features. Evaluation across three real-world projects demonstrates that EAEChat, respectively, achieves an average precision, recall, and F1-score of 0.653, 0.651, and 0.644, and it marks a significant 7.60% improvement over the state-of-the-art approaches. These findings confirm the effectiveness of our method in proficiently classifying developer chat messages in online chatrooms.

References

[1]
Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why developers are slacking off: Understanding how software teams use slack. In 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. 333–336.
[2]
Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and Alexey Zagalsky. 2014. The (r) evolution of social media in software engineering. Fut. Softw. Eng. Proc. (2014), 100–116.
[3]
Verena Käfer, Daniel Graziotin, Ivan Bogicevic, Stefan Wagner, and Jasmin Ramadani. 2018. Communication in open-source projects-end of the e-mail era? In 40th International Conference on Software Engineering. 242–243.
[4]
GitLab. 2014. Gitter. Retrieved from https://gitter.im/
[5]
Slack Technologies. 2013. Slack. Retrieved from https://slack.com/
[6]
Freenode. 1995. Freenode. Retrieved from https://freenode.net/
[7]
Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and Nicholas A. Kraft. 2019. Exploratory study of slack Q&A chats as a mining source for software engineering tools. In IEEE/ACM 16th International Conference on Mining Software Repositories (MSR’19). IEEE, 490–501.
[8]
Osama Ehsan, Safwat Hassan, Mariam El Mezouar, and Ying Zou. 2020. An empirical study of developer discussions in the Gitter platform. ACM Trans. Softw. Eng. Methodol. 30, 1 (2020), 1–39.
[9]
Lin Shi, Mingzhe Xing, Mingyang Li, Yawen Wang, Shoubin Li, and Qing Wang. 2020. Detection of hidden feature requests from massive chat messages via deep siamese network. In IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, 641–653.
[10]
Hareem Sahar, Abram Hindle, and Cor-Paul Bezemer. 2021. How are issue reports discussed in Gitter chat rooms? J. Syst. Softw. 172 (2021), 110852.
[11]
Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge. 2018. How do developers discuss rationale? In IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). IEEE, 357–369.
[12]
Deeksha Arya, Wenting Wang, Jin L. C. Guo, and Jinghui Cheng. 2019. Analysis and detection of information types of open source software issue discussions. In IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 454–464.
[13]
Eduard C. Groen, Norbert Seyff, Raian Ali, Fabiano Dalpiaz, Joerg Doerr, Emitza Guzman, Mahmood Hosseini, Jordi Marco, Marc Oriol, Anna Perini, and Stade Melanie. 2017. The crowd in requirements engineering: The landscape and challenges. IEEE Softw. 34, 2 (2017), 44–52.
[14]
Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, and Walter S. Lasecki. 2018. A large-scale corpus for conversation disentanglement. arXiv preprint arXiv:1810.11118 (2018).
[15]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645–6649.
[16]
Shengyi Pan, Lingfeng Bao, Xiaoxue Ren, Xin Xia, David Lo, and Shanping Li. 2021. Automating developer chat mining. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 854–866.
[17]
Xinbei Ma, Zhuosheng Zhang, and Hai Zhao. 2022. Structural characterization for dialogue disentanglement. arXiv preprint arXiv:2110.08018 (2022).
[18]
Yuan Meng, Xuhao Pan, Jun Chang, and Yue Wang. 2023. RGAT: A deeper look into syntactic dependency information for coreference resolution. In International Joint Conference on Neural Networks (IJCNN’23). 1–8. DOI:
[19]
Tong Zhao, Junjie Peng, Yansong Huang, Lan Wang, Huiran Zhang, and Zesu Cai. 2023. A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis. Appl. Intell. (112023), 1–14. DOI:
[20]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computat. 9, 8 (1997), 1735–1780.
[21]
Shafiq Joty, Alberto Barrón-Cedeño, Giovanni Da San Martino, Simone Filice, Lluís Màrquez, Alessandro Moschitti, and Preslav Nakov. 2015. Global thread-level inference for comment classification in community question answering. In Conference on Empirical Methods in Natural Language Processing, Lluís Màrquez, Chris Callison-Burch, and Jian Su (Eds.). Association for Computational Linguistics, 573–578. DOI:
[22]
Ruoyao Yang, Wanying Xie, Chunhua Liu, and Dong Yu. 2019. BLCU_NLP at SemEval-2019 Task 7: An inference chain-based GPT model for rumour evaluation. In 13th International Workshop on Semantic Evaluation, Jonathan May, Ekaterina Shutova, Aurelie Herbelot, Xiaodan Zhu, Marianna Apidianaki, and Saif M. Mohammad (Eds.). Association for Computational Linguistics, 1090–1096. DOI:
[23]
Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock. 2021. Automatically identifying the quality of developer chats for post hoc use. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 48 (July2021), 28 pages. DOI:
[24]
Marwa Tolba, Salima Ouadfel, and Souham Meshoul. 2021. Hybrid ensemble approaches to online harassment detection in highly imbalanced data. Expert Syst. Appl. 175, C (Aug.2021), 13. DOI:
[25]
Jonathan Herzig, Guy Feigenblat, Michal Shmueli-Scheuer, David Konopnicki, and Anat Rafaeli. 2016. Predicting customer satisfaction in customer support conversations in social media using affective features. In Conference on User Modeling Adaptation and Personalization (UMAP’16). Association for Computing Machinery, New York, NY, 115–119. DOI:
[26]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[27]
Google. 2019. BERT-Small on HuggingFace. Retrieved from https://huggingface.co/google/bert_uncased_L-4_H-256_A-4
[28]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018).
[29]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE, 539–546.
[30]
Mohamed Chiny, Omar Bencharef, Moulay Youssef Hadi, and Younes Chihab. 2021. A client-centric evaluation system to evaluate guest’s satisfaction on AirBNB using machine learning and NLP. Appl. Computat. Intell. Soft Comput. 2021 (2021), 1–14.
[31]
Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. Adv. Neural Inf. Process. Syst. 33 (2020), 6256–6268.
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
[33]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://api.semanticscholar.org/CorpusID:49313245
[34]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. PMLR, 1188–1196.
[35]
Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, and Dan Jurafsky. 2022. Problems with cosine as a measure of embedding similarity for high frequency words. In 60th Annual Meeting of the Association for Computational Linguistics, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 401–423. DOI:
[36]
Zeming Dong, Qiang Hu, Yuejun Guo, Zhenya Zhang, Maxime Cordy, Mike Papadakis, Yves Le Traon, and Jianjun Zhao. 2023. Boosting source code learning with data augmentation: An empirical study. arXiv preprint arXiv:2303.06808 (2023).
[37]
Paige Rodeghero, Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Detecting user story information in developer-client conversations to generate extractive summaries. In IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 49–59.
[38]
Andrew Wood, Paige Rodeghero, Ameer Armaly, and Collin McMillan. 2018. Detecting speech act types in developer question/answer conversations during bug repair. In 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 491–502.
[39]
Sarah Rastkar, Gail C. Murphy, and Gabriel Murray. 2014. Automatic summarization of bug reports. IEEE Trans. Softw. Eng. 40, 4 (2014), 366–380.
[40]
Gitter. 2014. Angular Chatroom on Gitter. Retrieved from https://gitter.im/angular/angular
[41]
Gitter. 2014. Deeplearning4j chatroom on Gitter. Retrieved from https://gitter.im/eclipse/deeplearning4j
[42]
Gitter. 2014. Spring-boot chatroom on Gitter. Retrieved from https://gitter.im/spring-projects/spring-boot
[43]
Gitter. 2014. Gitter developer page. Retrieved from https://developer.gitter.im/
[44]
Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2015. Development emails content analyzer: Intention mining in developer discussions (T). In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE, 12–23.
[45]
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In IEEE International Conference on Software Maintenance and Evolution (ICSME’15). IEEE, 281–290.
[46]
Donna Spencer. 2009. Card Sorting: Designing Usable Categories. Rosenfeld Media.
[47]
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20, 1 (1960), 37–46.
[48]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. GBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30 (2017).
[49]
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).
[50]
Qiao Huang, Xin Xia, David Lo, and Gail C. Murphy. 2018. Automating intention mining. IEEE Trans. Soft. Eng. 46, 10 (2018), 1098–1119.
[51]
Allen Institute for Artificial Intelligence. 2023. AllenNLP. Retrieved from https://allennlp.org/
[52]
Facebook. 2023. PyTorch. Retrieved from https://pytorch.org/
[53]
Hugging Face. 2020. Transformers. Retrieved from https://huggingface.co/
[54]
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In International Conference on Software Engineering (ICSE’13). IEEE Press, 432–441.
[55]
Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13). 279–289. DOI:
[56]
Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In 35th International Conference on Software Engineering (ICSE’13). 382–391. DOI:
[57]
Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayse Basar Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. 17 (2010), 375–407. Retrieved from https://api.semanticscholar.org/CorpusID:2782280
[58]
Emad Shihab, Zhen Ming Jiang, and Ahmed E. Hassan. 2009. Studying the use of developer IRC meetings in open source projects. In IEEE International Conference on Software Maintenance. IEEE, 147–156.
[59]
Rana Alkadhi, Teodora Lata, Emitza Guzmany, and Bernd Bruegge. 2017. Rationale in development chat messages: An exploratory study. In IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’17). IEEE, 436–446.
[60]
Anna Glazkova. 2020. A comparison of synthetic oversampling methods for multi-class text classification. arXiv preprint arXiv:2008.04636 (2020).
[61]
Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019).
[62]
Claude Coulombe. 2018. Text data augmentation made simple by leveraging NLP cloud APIs. arXiv preprint arXiv:1812.04718 (2018).
[63]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357.
[64]
Shikai Guo, Jian Dong, Hui Li, and Jiahui Wang. 2021. Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique. J. Softw.: Evolut. Process 33, 7 (2021), e2362.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 5
June 2024
952 pages
EISSN:1557-7392
DOI:10.1145/3618079
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2024
Online AM: 29 January 2024
Accepted: 15 January 2024
Revised: 11 January 2024
Received: 13 August 2023
Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

  1. Developer chatroom
  2. information type classification
  3. data augmentation
  4. deep learning

Qualifiers

  • Research-article

Funding Sources

  • Dalian Excellent Young Project
  • National Natural Science Foundation of China
  • Natural Science Foundation of Liaoning Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 376
    Total Downloads
  • Downloads (Last 12 months)376
  • Downloads (Last 6 weeks)21
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media