research-article

The Good, the Bad and their Kins: Identifying Questions with Negative Scores in StackOverflow

Authors:

Debasis Ganguly,

Gareth J. F. JonesAuthors Info & Claims

ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015

Pages 1232 - 1239

https://doi.org/10.1145/2808797.2809318

Published: 25 August 2015 Publication History

Abstract

A rapid increase in the number of questions posted on community question answering (CQA) forums is creating a need for automated methods of question quality moderation to improve the effectiveness of such forums in terms of response time and quality. Such automated approaches should aim to classify questions as good or bad for a particular forum as soon as they are posted based on the guidelines and quality standards defined/listed by the forum. Thus, if a question meets the standard of the forum then it is classified as good else we classify it as bad. In this paper, we propose a method to address this problem of question classification by retrieving similar questions previously asked in the same forum, and then using the text from these previously asked similar questions to predict the quality of the current question. We empirically validate our proposed approach on the set of StackOverflow data, a massive CQA forum for programmers, comprising of about 8M questions. With the use of these additional text retrieved from similar questions, we are able to improve the question quality prediction accuracy by about 2.8% and improve the recall of negatively scored questions by about 4.2%. This improvement of 4.2% in recall would be helpful in automatically flagging questions as bad (unsuitable) for the forum and will speed up the moderation process thus saving time and human effort.

References

[1]

L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann, "Design Lessons from the Fastest Q&a Site in the West," in CHI '11, pp. 2857--2866.

Digital Library

[2]

S. Ravi, B. Pang, V. Rastogi, and R. Kumar, "Great Question! Question Quality in Community Q&A," in Proc. of ICWSM '14, 2014.

[3]

D. Correa and A. Sureka, "Chaff from the wheat: characterization and modeling of deleted questions on stack overflow," in Proceedings of WWW '14, 2014, pp. 631--642.

Digital Library

[4]

M. Efron, P. Organisciak, and K. Fenlon, "Improving retrieval of short texts through document expansion," in Proceedings of the SIGIR '12, 2012, pp. 911--920.

Digital Library

[5]

C. Shah and J. Pomerantz, "Evaluating and Predicting Answer Quality in Community QA," in Proceedings of SIGIR '10, 2010, pp. 411--418.

Digital Library

[6]

Q. Tian, P. Zhang, and B. Li, "Towards predicting the best answers in community-based question-answering services." in Proceedings of ICWSM '13, 2013.

[7]

J. Jeon, W. B. Croft, J. H. Lee, and S. Park, "A framework to predict the quality of answers with non-textual features," in Proc. of SIGIR '06, 2006, pp. 228--235.

Digital Library

[8]

D. H. Dalip, M. A. Gonçalves, M. Cristo, and P. Calado, "Exploiting user feedback to learn to rank answers in Q&a forums: a case study with stack overflow," in Proceedings of SIGIR '13, 2013, pp. 543--552.

Digital Library

[9]

A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec, "Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow," in Proceedings of KDD '12, pp. 850--858.

Digital Library

[10]

B. Li, T. Jin, M. R. Lyu, I. King, and B. Mak, "Analyzing and Predicting Question Quality in Community Question Answering Services," in Proceedings of CQA '12 Workshop, ser. WWW '12 Companion, 2012, pp. 775--782.

Digital Library

[11]

D. Correa and A. Sureka, "Fit or Unfit: Analysis and Prediction of 'Closed Questions' on Stack Overflow," in Proceedings of COSN '13, 2013, pp. 201--212.

Digital Library

[12]

Y. Yao, H. Tong, T. Xie, L. Akoglu, F. Xu, and J. Lu, "Want a good answer? ask a good question first!" CoRR, vol. abs/1311.6876, 2013.

[13]

L. Ponzanelli, A. Mocci, A. Bacchelli, and M. Lanza, "Understanding and classifying the quality of technical forum questions," Univ. of Lugano, Tech. Rep. 2014/02, Jun. 2014.

[14]

J. Liu, Q. Wang, C.-Y. Lin, and H.-W. Hon, "Question difficulty estimation in community question answering services," in Proc. of EMNLP, October 2013, pp. 85--90.

[15]

G. Burel, Y. He, and H. Alani, "Automatic Identification of Best Answers in Online Enquiry Communities," in Proceedings of ESWC'12, 2012, pp. 514--529.

Digital Library

[16]

H. Toba, Z.-Y. Ming, M. Adriani, and T.-S. Chua, "Discovering high quality answers in community question answering archives using a hierarchy of classifiers," Inf. Sci., vol. 261, pp. 101--115, 2014.

Digital Library

[17]

C. Treude, O. Barzilay, and M.-A. Storey, "How do programmers ask and answer questions on the web?: Nier track," in Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011, pp. 804--807.

Digital Library

[18]

M. Asaduzzaman, A. S. Mashiyat, C. K. Roy, and K. A. Schneider, "Answering questions about unanswered questions of stack overflow," in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR '13, 2013, pp. 97--100.

Digital Library

[19]

S. Chang and A. Pal, "Routing questions for collaborative answering in community question answering," in Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on. IEEE, 2013, pp. 494--501.

Digital Library

[20]

X. Liu, W. B. Croft, and M. B. Koll, "Finding experts in community-based question-answering services," in Proceedings of CIKM '05, 2005, pp. 315--316.

Digital Library

[21]

J. Yang, C. Hauff, A. Bozzon, and G.-J. Houben, "Asking the right question in collaborative q&a systems," in Proceedings of the 25th ACM Conference on Hypertext and Social Media, ser. HT '14. New York, NY, USA: ACM, 2014, pp. 179--189.

Digital Library

[22]

A. Bosu, C. S. Corley, D. Heaton, D. Chatterji, J. C. Carver, and N. A. Kraft, "Building reputation in stackoverflow: An empirical investigation," in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR '13, 2013, pp. 89--92.

Digital Library

[23]

J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, "Tackling the poor assumptions of naive bayes text classifiers," in Proc. of ICML '03, 2003, pp. 616--623.

[24]

D. Ganguly, J. Leveling, and G. J. F. Jones, "An lda-smoothed relevance model for document expansion: a case study for spoken document retrieval," in Proc. of SIGIR '13, 2013, pp. 1057--1060.

Digital Library

[25]

J. Jeon, W. B. Croft, and J. H. Lee, "Finding similar questions in large question and answer archives," in Proceedings of the 14th ACM International Conference on Information and Knowledge Management, ser. CIKM '05, 2005, pp. 84--90.

Digital Library

Cited By

Gong LZhang H(2024) MR 2 -KG: A multi-relation multi-rationale knowledge graph for modeling software engineering knowledge on Stack Overflow IEEE Transactions on Software Engineering10.1109/TSE.2024.3403108(1-20)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3403108
Yang SChen XLiu KYang GYu C(2024)Automatic bi-modal question title generation for Stack Overflow with prompt learningEmpirical Software Engineering10.1007/s10664-024-10466-429:3Online publication date: 3-May-2024
https://doi.org/10.1007/s10664-024-10466-4
Anwar ZAfzal HAhsan AIltaf NMaqbool A(2023)A novel hybrid CNN-LSTM approach for assessing StackOverflow post qualityJournal of Intelligent Systems10.1515/jisys-2023-005732:1Online publication date: 28-Nov-2023
https://doi.org/10.1515/jisys-2023-0057
Show More Cited By

The Good, the Bad and their Kins: Identifying Questions with Negative Scores in StackOverflow
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
2. Information systems
  1. Information retrieval
  2. Information systems applications

Recommendations

Empirical Software Engineering Research - The Good, The Bad, The Ugly
ESEM '11: Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement

The Software Engineering Research community has slowly recognized that empirical studies are an important way of validating ideas and increasingly our community has stopped accepting the sufficiency of arguing that a smart person has come up with the ...
Bad Users or Bad Content?: Breaking the Vicious Cycle by Finding Struggling Students in Community Question-Answering
CHIIR '17: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval

Community Question Answering (CQA) services have become popular methods to seek and share information. In CQA, users with an information need, or askers, post a question that community members can answer. This question-answering process allows both ...
From bad to good: an investigation of question quality and transformation
ASIST '13: Proceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries

Social question answering (SQA) services are a popular way for people to exchange information. Unfortunately, the quality of information exchanged can be variable and few studies focus on the quality of questions asked. To address this, we explored the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015

August 2015

835 pages

ISBN:9781450338547

DOI:10.1145/2808797

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ASONAM '15

Sponsor:

SIGKDD

ASONAM '15: Advances in Social Networks Analysis and Mining 2015

August 25 - 28, 2015

Paris, France

Acceptance Rates

Overall Acceptance Rate 116 of 549 submissions, 21%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
387
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gong LZhang H(2024) MR 2 -KG: A multi-relation multi-rationale knowledge graph for modeling software engineering knowledge on Stack Overflow IEEE Transactions on Software Engineering10.1109/TSE.2024.3403108(1-20)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3403108
Yang SChen XLiu KYang GYu C(2024)Automatic bi-modal question title generation for Stack Overflow with prompt learningEmpirical Software Engineering10.1007/s10664-024-10466-429:3Online publication date: 3-May-2024
https://doi.org/10.1007/s10664-024-10466-4
Anwar ZAfzal HAhsan AIltaf NMaqbool A(2023)A novel hybrid CNN-LSTM approach for assessing StackOverflow post qualityJournal of Intelligent Systems10.1515/jisys-2023-005732:1Online publication date: 28-Nov-2023
https://doi.org/10.1515/jisys-2023-0057
Zhou YYang SChen XZhang ZPei J(2023)QTC4SO: Automatic Question Title Completion for Stack Overflow2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)10.1109/ICPC58990.2023.00011(1-12)Online publication date: May-2023
https://doi.org/10.1109/ICPC58990.2023.00011
Ithipathachai VAzizi MHong JBures MPark JCerny T(2022)Are tags 'it?'Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3506985(1483-1490)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3477314.3506985
Sabir FGueheneuc YPalma FMoha NRasool GAkhtar H(2022)A Mixed-Method Approach to Recommend Corrections and Correct REST AntipatternsIEEE Transactions on Software Engineering10.1109/TSE.2021.311702348:11(4319-4338)Online publication date: 1-Nov-2022
https://doi.org/10.1109/TSE.2021.3117023
Liu KYang GChen XYu C(2022)SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00075(577-588)Online publication date: Mar-2022
https://doi.org/10.1109/SANER53432.2022.00075
Guo SPing BSong ZLi HChen R(2022)Generating High Quality Titles in StackOverflow via Data Denoising Method2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)10.1109/PAAP56126.2022.10010656(1-6)Online publication date: 25-Nov-2022
https://doi.org/10.1109/PAAP56126.2022.10010656
Annamoradnejad IHabibi JFazli M(2022)Multi-view approach to suggest moderation actions in community question answering sitesInformation Sciences: an International Journal10.1016/j.ins.2022.03.085600:C(144-154)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.ins.2022.03.085
Parra EAlahmadi MEllis AHaiduc S(2022)A comparative study and analysis of developer communications on Slack and GitterEmpirical Software Engineering10.1007/s10664-021-10095-127:2Online publication date: 13-Jan-2022
https://doi.org/10.1007/s10664-021-10095-1
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents