Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3387940.3391491acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification

Published: 25 September 2020 Publication History

Abstract

This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.

References

[1]
Jamar M. Al-Kofahi et al., 2010, "Fuzzy set approach for automatic tagging in evolving software," International Conference on Software Maintenance, 1--10
[2]
Xuyang Cai, Jiangang Zhu, Beijun Shen, Yuting Chen, 2016, "GRETA: Graph-Based Tag Assignment for GitHub Repositories", 40th Annual Computer Software and Applications Conference
[3]
Clayton Stanley, Michael D. Byrne, 2013, "Predicting Tags for Stack Overflow Posts", Proceedings of ICCM.
[4]
Jose R.C. Gonzalez, Juan J.F. Romero, Mario G. Guerrero, Felix Calderon, 2015, "Multi-class Multi-tag Classifier System for Stack Overflow Questions", Int. Autumn Meeting on Power, Electronics and Computing.
[5]
David M. Blei, Andrew Y. Ng, Michael I. Jordan, 2003, "Latent dirichlet allocation", Journal of machine Learning research.
[6]
Nitesh V. Chawla, Kevin Bowyer, Lawrence O. Hall, W. Philip Kegelmeyer, 2002, "SMOTE: Synthetic Minority Over-sampling Technique", Journal of Artificial Intellegence Research, vol. 16.
[7]
Yin Zhang, Rong Jin, Zhi-Hua Zhou, 2010, "Understanding bag-of-words model: a statistical framework", Int. Journal of Machine Learning and Cybernetics, volume 1, issue 1-4, pp. 43-52
[8]
Shaowei Wang, David Lo, Bogdan Vasilescu, Alexander Serebrenik, 2018, "EnTagRec ++: An enhanced tag recommendation system for software information sites", Emp. Soft. Eng. volume 23, issue 2, pp 800--832.
[9]
Taniya Saini, Sachin Tripathi. "Predicting tags for stack overflow questions using different classifiers", 4th International Conference on Recent Advances in Information Technology (RAIT 2018).

Cited By

View all
  • (2024)Exploring GitHub Topics: Unveiling Their Content and Potential2024 IEEE International Conference on Software Services Engineering (SSE)10.1109/SSE62657.2024.00017(25-35)Online publication date: 7-Jul-2024
  • (2024)Predicting Software Energy Consumption Using Time Series-Based Recurrent Neural Network with Natural Language Processing on Stack Overflow Data2024 Asian Conference on Communication and Networks (ASIANComNet)10.1109/ASIANComNet63184.2024.10811023(1-6)Online publication date: 24-Oct-2024
  • (2024)Advanced Automated Tagging for Stack Overflow: A Multi-Stage Approach Using Deep Learning and NLP Techniques2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP)10.1109/AISP61396.2024.10475258(1-6)Online publication date: 21-Feb-2024
  • Show More Cited By

Index Terms

  1. Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops
    June 2020
    831 pages
    ISBN:9781450379632
    DOI:10.1145/3387940
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Latent Dirichlet Allocation
    2. Stack Overflow
    3. tag prediction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSE '20
    Sponsor:
    ICSE '20: 42nd International Conference on Software Engineering
    June 27 - July 19, 2020
    Seoul, Republic of Korea

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploring GitHub Topics: Unveiling Their Content and Potential2024 IEEE International Conference on Software Services Engineering (SSE)10.1109/SSE62657.2024.00017(25-35)Online publication date: 7-Jul-2024
    • (2024)Predicting Software Energy Consumption Using Time Series-Based Recurrent Neural Network with Natural Language Processing on Stack Overflow Data2024 Asian Conference on Communication and Networks (ASIANComNet)10.1109/ASIANComNet63184.2024.10811023(1-6)Online publication date: 24-Oct-2024
    • (2024)Advanced Automated Tagging for Stack Overflow: A Multi-Stage Approach Using Deep Learning and NLP Techniques2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP)10.1109/AISP61396.2024.10475258(1-6)Online publication date: 21-Feb-2024
    • (2023)DENT: A Tool for Tagging Stack Overflow Posts with Deep Learning Energy PatternsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613092(2157-2161)Online publication date: 30-Nov-2023
    • (2021)TagNNWireless Communications & Mobile Computing10.1155/2021/99562072021Online publication date: 1-Jan-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media