research-article

By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites

Authors:

Zhenchang Xing,

Yang LiuAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 1, Issue CSCW

Article No.: 32, Pages 1 - 21

https://doi.org/10.1145/3134667

Published: 06 December 2017 Publication History

Abstract

Community edits to questions and answers (called post edits) plays an important role in improving content quality in Stack Overflow. Our study of post edits in Stack Overflow shows that a large number of edits are about formatting, grammar and spelling. These post edits usually involve small-scale sentence edits and our survey of trusted contributors suggests that most of them care much or very much about such small sentence edits. To assist users in making small sentence edits, we develop an edit-assistance tool for identifying minor textual issues in posts and recommending sentence edits for correction. We formulate the sentence editing task as a machine translation problem, in which an original sentence is "translated" into an edited sentence. Our tool implements a character-level Recurrent Neural Network (RNN) encoder-decoder model, trained with about 6.8 millions original-edited sentence pairs from Stack Overflow post edits. We evaluate our edit assistance tool using a large-scale archival post edits, a field study of assisting a novice post editor, and a survey of trusted contributors. Our evaluation demonstrates the feasibility of training a deep learning model with post edits by the community and then using the trained model to assist post editing for the community.

References

[1]

2017. The Objective Revision Evaluation Service. https://ores.wikimedia.org/. (2017).

[2]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).

[3]

Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81.

Digital Library

[4]

Guntis Barzdins, Steve Renals, and Didzis Gosko. 2016. Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project. arXiv preprint arXiv:1604.01221 (2016).

[5]

Lasse Bergroth, Harri Hakonen, and Timo Raita. 2000. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on. IEEE, 39--48.

Digital Library

[6]

Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics 18, 4 (1992), 467--479.

Digital Library

[7]

Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 14.

Digital Library

[8]

Chunyang Chen and Zhenchang Xing. 2016. Towards correlating search on google and asking on stack overflow. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 83--92.

[9]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[10]

Shamil Chollampatt, Kaveh Taghipour, and Hwee Tou Ng. 2016. Neural Network Translation Models for Grammatical Error Correction. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 2768--2774.

Digital Library

[11]

Robert Dale, Ilya Anisimoff, and George Narroway. 2012. HOO 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics, 54--62.

Digital Library

[12]

Robert Dale and Adam Kilgarriff. 2011. Helping our own: The HOO 2011 pilot shared task. In Proceedings of the 13th European Workshop on Natural Language Generation. Association for Computational Linguistics, 242--249.

Digital Library

[13]

Mariano Felice. 2016. Artificial error generation for translation-based grammatical error correction. Technical Report. University of Cambridge, Computer Laboratory.

[14]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 6645--6649.

[15]

Jonathan Grudin. 1994. Groupware and social dynamics: Eight challenges for developers. Commun. ACM 37, 1 (1994), 92--105.

Digital Library

[16]

Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2016. Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction. arXiv preprint arXiv:1605.06353 (2016).

[17]

Aniket Kittur and Robert E Kraut. 2008. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work. ACM, 37--46.

Digital Library

[18]

Philipp Koehn. 2009. Statistical machine translation. Cambridge University Press.

Digital Library

[19]

Jean Lave and Etienne Wenger. 1999. Legitimate peripheral participation. Learners, learning and assessment, London: The Open University (1999), 83--89.

[20]

Guo Li, Tun Lu, Xianghua Ding, and Ning Gu. 2016. Predicting Collaborative Edits of Questions and Answers in Online Q&A Sites. 17, 6 (2016), 1187--1194.

[21]

Guo Li, Haiyi Zhu, Tun Lu, Xianghua Ding, and Ning Gu. 2015. Is It Good to Be Like Wikipedia?: Exploring the Trade-offs of Introducing Collaborative Editing Model to Q&A Sites. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 1080--1091.

Digital Library

[22]

Zhuoran Liu and Yang Liu. 2016. Exploiting Unlabeled Data for Neural Grammatical Error Detection. arXiv preprint arXiv:1611.08987 (2016).

[23]

Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the Rare Word Problem in Neural Machine Translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers. 11--19.

[24]

Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2857--2866.

Digital Library

[25]

Piotr Mirowski and Andreas Vlachos. 2015. Dependency recurrent neural language models for sentence completion. arXiv preprint arXiv:1507.01193 (2015).

[26]

Tomoya Mizumoto and Yuji Matsumoto. 2016. Discriminative reranking for grammatical error correction with statistical machine translation. In Proceedings of NAACL-HLT. 1133--1138.

[27]

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 2. 588--593.

[28]

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2016. GLEU Without Tuning. arXiv preprint arXiv:1605.02592 (2016).

[29]

Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. 2016. There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction. arXiv preprint arXiv:1610.02124 (2016).

[30]

Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 Shared Task on Grammatical Error Correction. In CoNLL Shared Task. 1--14.

[31]

Daniel Ortiz-Martínez, Ismael García-Varea, and Francisco Casacuberta. 2005. Thot: a toolkit to train phrase-based statistical translation models. Tenth Machine Translation Summit. AAMT, Phuket, Thailand, September (2005).

[32]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.

Digital Library

[33]

Keisuke Sakaguchi, Courtney Napoles, Matt Post, and Joel Tetreault. 2016. Reassessing the goals of grammatical error correction: Fluency instead of grammaticality. Transactions of the Association for Computational Linguistics 4 (2016), 169--182.

[34]

Allen Schmaltz, Yoon Kim, Alexander M Rush, and Stuart M Shieber. 2016. Sentence-level grammatical error identification as sequence-to-sequence correction. arXiv preprint arXiv:1604.04677 (2016).

[35]

Andrew W Vargo and Shigeo Matsubara. 2016. Editing Unfit Questions in Q&A. In Advanced Applied Informatics (IIAI-AAI), 2016 5th IIAI International Congress on. IEEE, 107--112.

[36]

Fernanda B Viegas, Martin Wattenberg, and Jonathan Feinberg. 2009. Participatory visualization with wordle. IEEE transactions on visualization and computer graphics 15, 6 (2009).

Digital Library

[37]

Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 10 (1990), 1550--1560.

[38]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144 (2016).

[39]

Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. 2016. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727 (2016).

[40]

Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of NAACL-HLT. 380--386.

[41]

Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016. Candidate re-ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. 256--266.

[42]

Ying Zhang, Stephan Vogel, and Alex Waibel. 2004. Interpreting bleu/nist scores: How much improvement do we need to have a better system?. In LREC.

Cited By

Zheng HChen JLiu TCheng YWang ZWang YGao LJi SZhang X(2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/370232528:1(1-28)Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702325
Fang ZHuang Y(2024)"Math is a pain!": Understanding challenges and needs of the Machine Learning community on Stack OverflowProceedings of the ACM on Human-Computer Interaction10.1145/36869908:CSCW2(1-35)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686990
Choffrut AGuerraoui RPinot RSirdey RStephan JZuber MSchiavoni VEdinger JCao JJin Z(2024)Towards Practical Homomorphic Aggregation in Byzantine-Resilient Distributed LearningProceedings of the 25th International Middleware Conference10.1145/3652892.3700783(431-444)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3652892.3700783
Show More Cited By

Index Terms

By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites
1. Applied computing
  1. Document management and text processing
    1. Document management
      1. Text editing
2. Human-centered computing
  1. Collaborative and social computing

Recommendations

Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites

To ensure the post quality, Q&A sites usually develop a list of quality assurance guidelines for "dos and don'ts", and adopt collaborative editing mechanism to fix quality violations. Quality guidelines are mostly high-level principles, and many tacit ...
Identifying the influential bloggers in a community
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...
Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites
CSCW2

Collaborative editing questions and answers plays an important role in quality control of Mathematics StackExchange which is a math Q&A Site. Our study of post edits in Mathematics Stack Exchange shows that there is a large number of math-related edits ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 1, Issue CSCW

November 2017

2095 pages

EISSN:2573-0142

DOI:10.1145/3171581

Editors:
Clifford Lampe
University of Michigan
,
Jeff Nichols
Google
,
Karrie Karahalios
University of Illinois Urbana-Champaign
,
Geraldine Fitzpatrick
Vienna University of Technology
,
Uichin Lee
KAIST
,
Andres Monroy-Hernandez
Snap Inc.
,
Wolfgang Stuerzlinger
Simon Fraser University

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2017

Published in PACMHCI Volume 1, Issue CSCW

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
385
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)13

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng HChen JLiu TCheng YWang ZWang YGao LJi SZhang X(2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/370232528:1(1-28)Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702325
Fang ZHuang Y(2024)"Math is a pain!": Understanding challenges and needs of the Machine Learning community on Stack OverflowProceedings of the ACM on Human-Computer Interaction10.1145/36869908:CSCW2(1-35)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686990
Choffrut AGuerraoui RPinot RSirdey RStephan JZuber MSchiavoni VEdinger JCao JJin Z(2024)Towards Practical Homomorphic Aggregation in Byzantine-Resilient Distributed LearningProceedings of the 25th International Middleware Conference10.1145/3652892.3700783(431-444)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3652892.3700783
Han SBuyukates BHu ZJin HJin WSun LWang XWu WXie CYao YZhang KZhang QZhang YJoe-Wong CAvestimehr SHe CBaeza-Yates RBonchi F(2024)FedSecurity: A Benchmark for Attacks and Defenses in Federated Learning and Federated LLMsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671545(5070-5081)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671545
Yang YLi QNie CHong YWang BSerra ESpezzano F(2024)Breaking State-of-the-Art Poisoning Defenses to Federated Learning: An Optimization-Based Attack FrameworkProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679566(2930-2939)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679566
Guerraoui RGupta NPinot R(2024)Byzantine Machine Learning: A PrimerACM Computing Surveys10.1145/361653756:7(1-39)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.1145/3616537
Zhao PWu JLiu Z(2024)Robust Federated Learning with Realistic CorruptionWeb and Big Data10.1007/978-981-97-7241-4_15(228-242)Online publication date: 31-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-7241-4_15
Zhou GXu PWang YTian ZOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)H-nobsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667590(33838-33855)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667590
Tupitsa NAlmansoori AWu YTakáč MNandakumar KHorváth SGorbunov EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Byzantine-tolerant methods for distributed variational inequalitiesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667357(28393-28461)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667357
Farhadkhani SGuerraoui RGupta NHoang LPinot RStephan JKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Robust collaborative learning with linear gradient overheadProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618799(9761-9813)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618799
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents