Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2808797.2809318acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

The Good, the Bad and their Kins: Identifying Questions with Negative Scores in StackOverflow

Published: 25 August 2015 Publication History

Abstract

A rapid increase in the number of questions posted on community question answering (CQA) forums is creating a need for automated methods of question quality moderation to improve the effectiveness of such forums in terms of response time and quality. Such automated approaches should aim to classify questions as good or bad for a particular forum as soon as they are posted based on the guidelines and quality standards defined/listed by the forum. Thus, if a question meets the standard of the forum then it is classified as good else we classify it as bad. In this paper, we propose a method to address this problem of question classification by retrieving similar questions previously asked in the same forum, and then using the text from these previously asked similar questions to predict the quality of the current question. We empirically validate our proposed approach on the set of StackOverflow data, a massive CQA forum for programmers, comprising of about 8M questions. With the use of these additional text retrieved from similar questions, we are able to improve the question quality prediction accuracy by about 2.8% and improve the recall of negatively scored questions by about 4.2%. This improvement of 4.2% in recall would be helpful in automatically flagging questions as bad (unsuitable) for the forum and will speed up the moderation process thus saving time and human effort.

References

[1]
L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann, "Design Lessons from the Fastest Q&a Site in the West," in CHI '11, pp. 2857--2866.
[2]
S. Ravi, B. Pang, V. Rastogi, and R. Kumar, "Great Question! Question Quality in Community Q&A," in Proc. of ICWSM '14, 2014.
[3]
D. Correa and A. Sureka, "Chaff from the wheat: characterization and modeling of deleted questions on stack overflow," in Proceedings of WWW '14, 2014, pp. 631--642.
[4]
M. Efron, P. Organisciak, and K. Fenlon, "Improving retrieval of short texts through document expansion," in Proceedings of the SIGIR '12, 2012, pp. 911--920.
[5]
C. Shah and J. Pomerantz, "Evaluating and Predicting Answer Quality in Community QA," in Proceedings of SIGIR '10, 2010, pp. 411--418.
[6]
Q. Tian, P. Zhang, and B. Li, "Towards predicting the best answers in community-based question-answering services." in Proceedings of ICWSM '13, 2013.
[7]
J. Jeon, W. B. Croft, J. H. Lee, and S. Park, "A framework to predict the quality of answers with non-textual features," in Proc. of SIGIR '06, 2006, pp. 228--235.
[8]
D. H. Dalip, M. A. Gonçalves, M. Cristo, and P. Calado, "Exploiting user feedback to learn to rank answers in Q&a forums: a case study with stack overflow," in Proceedings of SIGIR '13, 2013, pp. 543--552.
[9]
A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec, "Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow," in Proceedings of KDD '12, pp. 850--858.
[10]
B. Li, T. Jin, M. R. Lyu, I. King, and B. Mak, "Analyzing and Predicting Question Quality in Community Question Answering Services," in Proceedings of CQA '12 Workshop, ser. WWW '12 Companion, 2012, pp. 775--782.
[11]
D. Correa and A. Sureka, "Fit or Unfit: Analysis and Prediction of 'Closed Questions' on Stack Overflow," in Proceedings of COSN '13, 2013, pp. 201--212.
[12]
Y. Yao, H. Tong, T. Xie, L. Akoglu, F. Xu, and J. Lu, "Want a good answer? ask a good question first!" CoRR, vol. abs/1311.6876, 2013.
[13]
L. Ponzanelli, A. Mocci, A. Bacchelli, and M. Lanza, "Understanding and classifying the quality of technical forum questions," Univ. of Lugano, Tech. Rep. 2014/02, Jun. 2014.
[14]
J. Liu, Q. Wang, C.-Y. Lin, and H.-W. Hon, "Question difficulty estimation in community question answering services," in Proc. of EMNLP, October 2013, pp. 85--90.
[15]
G. Burel, Y. He, and H. Alani, "Automatic Identification of Best Answers in Online Enquiry Communities," in Proceedings of ESWC'12, 2012, pp. 514--529.
[16]
H. Toba, Z.-Y. Ming, M. Adriani, and T.-S. Chua, "Discovering high quality answers in community question answering archives using a hierarchy of classifiers," Inf. Sci., vol. 261, pp. 101--115, 2014.
[17]
C. Treude, O. Barzilay, and M.-A. Storey, "How do programmers ask and answer questions on the web?: Nier track," in Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011, pp. 804--807.
[18]
M. Asaduzzaman, A. S. Mashiyat, C. K. Roy, and K. A. Schneider, "Answering questions about unanswered questions of stack overflow," in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR '13, 2013, pp. 97--100.
[19]
S. Chang and A. Pal, "Routing questions for collaborative answering in community question answering," in Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on. IEEE, 2013, pp. 494--501.
[20]
X. Liu, W. B. Croft, and M. B. Koll, "Finding experts in community-based question-answering services," in Proceedings of CIKM '05, 2005, pp. 315--316.
[21]
J. Yang, C. Hauff, A. Bozzon, and G.-J. Houben, "Asking the right question in collaborative q&a systems," in Proceedings of the 25th ACM Conference on Hypertext and Social Media, ser. HT '14. New York, NY, USA: ACM, 2014, pp. 179--189.
[22]
A. Bosu, C. S. Corley, D. Heaton, D. Chatterji, J. C. Carver, and N. A. Kraft, "Building reputation in stackoverflow: An empirical investigation," in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR '13, 2013, pp. 89--92.
[23]
J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, "Tackling the poor assumptions of naive bayes text classifiers," in Proc. of ICML '03, 2003, pp. 616--623.
[24]
D. Ganguly, J. Leveling, and G. J. F. Jones, "An lda-smoothed relevance model for document expansion: a case study for spoken document retrieval," in Proc. of SIGIR '13, 2013, pp. 1057--1060.
[25]
J. Jeon, W. B. Croft, and J. H. Lee, "Finding similar questions in large question and answer archives," in Proceedings of the 14th ACM International Conference on Information and Knowledge Management, ser. CIKM '05, 2005, pp. 84--90.

Cited By

View all
  • (2024) MR 2 -KG: A multi-relation multi-rationale knowledge graph for modeling software engineering knowledge on Stack Overflow IEEE Transactions on Software Engineering10.1109/TSE.2024.3403108(1-20)Online publication date: 2024
  • (2024)Automatic bi-modal question title generation for Stack Overflow with prompt learningEmpirical Software Engineering10.1007/s10664-024-10466-429:3Online publication date: 3-May-2024
  • (2023)A novel hybrid CNN-LSTM approach for assessing StackOverflow post qualityJournal of Intelligent Systems10.1515/jisys-2023-005732:1Online publication date: 28-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015
August 2015
835 pages
ISBN:9781450338547
DOI:10.1145/2808797
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ASONAM '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 116 of 549 submissions, 21%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024) MR 2 -KG: A multi-relation multi-rationale knowledge graph for modeling software engineering knowledge on Stack Overflow IEEE Transactions on Software Engineering10.1109/TSE.2024.3403108(1-20)Online publication date: 2024
  • (2024)Automatic bi-modal question title generation for Stack Overflow with prompt learningEmpirical Software Engineering10.1007/s10664-024-10466-429:3Online publication date: 3-May-2024
  • (2023)A novel hybrid CNN-LSTM approach for assessing StackOverflow post qualityJournal of Intelligent Systems10.1515/jisys-2023-005732:1Online publication date: 28-Nov-2023
  • (2023)QTC4SO: Automatic Question Title Completion for Stack Overflow2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)10.1109/ICPC58990.2023.00011(1-12)Online publication date: May-2023
  • (2022)Are tags 'it?'Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3506985(1483-1490)Online publication date: 25-Apr-2022
  • (2022)A Mixed-Method Approach to Recommend Corrections and Correct REST AntipatternsIEEE Transactions on Software Engineering10.1109/TSE.2021.311702348:11(4319-4338)Online publication date: 1-Nov-2022
  • (2022)SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00075(577-588)Online publication date: Mar-2022
  • (2022)Generating High Quality Titles in StackOverflow via Data Denoising Method2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)10.1109/PAAP56126.2022.10010656(1-6)Online publication date: 25-Nov-2022
  • (2022)Multi-view approach to suggest moderation actions in community question answering sitesInformation Sciences: an International Journal10.1016/j.ins.2022.03.085600:C(144-154)Online publication date: 1-Jul-2022
  • (2022)A comparative study and analysis of developer communications on Slack and GitterEmpirical Software Engineering10.1007/s10664-021-10095-127:2Online publication date: 13-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media