Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites

Published: 06 December 2017 Publication History

Abstract

Community edits to questions and answers (called post edits) plays an important role in improving content quality in Stack Overflow. Our study of post edits in Stack Overflow shows that a large number of edits are about formatting, grammar and spelling. These post edits usually involve small-scale sentence edits and our survey of trusted contributors suggests that most of them care much or very much about such small sentence edits. To assist users in making small sentence edits, we develop an edit-assistance tool for identifying minor textual issues in posts and recommending sentence edits for correction. We formulate the sentence editing task as a machine translation problem, in which an original sentence is "translated" into an edited sentence. Our tool implements a character-level Recurrent Neural Network (RNN) encoder-decoder model, trained with about 6.8 millions original-edited sentence pairs from Stack Overflow post edits. We evaluate our edit assistance tool using a large-scale archival post edits, a field study of assisting a novice post editor, and a survey of trusted contributors. Our evaluation demonstrates the feasibility of training a deep learning model with post edits by the community and then using the trained model to assist post editing for the community.

References

[1]
2017. The Objective Revision Evaluation Service. https://ores.wikimedia.org/. (2017).
[2]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
[3]
Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76--81.
[4]
Guntis Barzdins, Steve Renals, and Didzis Gosko. 2016. Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project. arXiv preprint arXiv:1604.01221 (2016).
[5]
Lasse Bergroth, Harri Hakonen, and Timo Raita. 2000. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on. IEEE, 39--48.
[6]
Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics 18, 4 (1992), 467--479.
[7]
Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 14.
[8]
Chunyang Chen and Zhenchang Xing. 2016. Towards correlating search on google and asking on stack overflow. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 83--92.
[9]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[10]
Shamil Chollampatt, Kaveh Taghipour, and Hwee Tou Ng. 2016. Neural Network Translation Models for Grammatical Error Correction. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 2768--2774.
[11]
Robert Dale, Ilya Anisimoff, and George Narroway. 2012. HOO 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics, 54--62.
[12]
Robert Dale and Adam Kilgarriff. 2011. Helping our own: The HOO 2011 pilot shared task. In Proceedings of the 13th European Workshop on Natural Language Generation. Association for Computational Linguistics, 242--249.
[13]
Mariano Felice. 2016. Artificial error generation for translation-based grammatical error correction. Technical Report. University of Cambridge, Computer Laboratory.
[14]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 6645--6649.
[15]
Jonathan Grudin. 1994. Groupware and social dynamics: Eight challenges for developers. Commun. ACM 37, 1 (1994), 92--105.
[16]
Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2016. Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction. arXiv preprint arXiv:1605.06353 (2016).
[17]
Aniket Kittur and Robert E Kraut. 2008. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work. ACM, 37--46.
[18]
Philipp Koehn. 2009. Statistical machine translation. Cambridge University Press.
[19]
Jean Lave and Etienne Wenger. 1999. Legitimate peripheral participation. Learners, learning and assessment, London: The Open University (1999), 83--89.
[20]
Guo Li, Tun Lu, Xianghua Ding, and Ning Gu. 2016. Predicting Collaborative Edits of Questions and Answers in Online Q&A Sites. 17, 6 (2016), 1187--1194.
[21]
Guo Li, Haiyi Zhu, Tun Lu, Xianghua Ding, and Ning Gu. 2015. Is It Good to Be Like Wikipedia?: Exploring the Trade-offs of Introducing Collaborative Editing Model to Q&A Sites. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 1080--1091.
[22]
Zhuoran Liu and Yang Liu. 2016. Exploiting Unlabeled Data for Neural Grammatical Error Detection. arXiv preprint arXiv:1611.08987 (2016).
[23]
Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the Rare Word Problem in Neural Machine Translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers. 11--19.
[24]
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2857--2866.
[25]
Piotr Mirowski and Andreas Vlachos. 2015. Dependency recurrent neural language models for sentence completion. arXiv preprint arXiv:1507.01193 (2015).
[26]
Tomoya Mizumoto and Yuji Matsumoto. 2016. Discriminative reranking for grammatical error correction with statistical machine translation. In Proceedings of NAACL-HLT. 1133--1138.
[27]
Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 2. 588--593.
[28]
Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2016. GLEU Without Tuning. arXiv preprint arXiv:1605.02592 (2016).
[29]
Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. 2016. There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction. arXiv preprint arXiv:1610.02124 (2016).
[30]
Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 Shared Task on Grammatical Error Correction. In CoNLL Shared Task. 1--14.
[31]
Daniel Ortiz-Martínez, Ismael García-Varea, and Francisco Casacuberta. 2005. Thot: a toolkit to train phrase-based statistical translation models. Tenth Machine Translation Summit. AAMT, Phuket, Thailand, September (2005).
[32]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.
[33]
Keisuke Sakaguchi, Courtney Napoles, Matt Post, and Joel Tetreault. 2016. Reassessing the goals of grammatical error correction: Fluency instead of grammaticality. Transactions of the Association for Computational Linguistics 4 (2016), 169--182.
[34]
Allen Schmaltz, Yoon Kim, Alexander M Rush, and Stuart M Shieber. 2016. Sentence-level grammatical error identification as sequence-to-sequence correction. arXiv preprint arXiv:1604.04677 (2016).
[35]
Andrew W Vargo and Shigeo Matsubara. 2016. Editing Unfit Questions in Q&A. In Advanced Applied Informatics (IIAI-AAI), 2016 5th IIAI International Congress on. IEEE, 107--112.
[36]
Fernanda B Viegas, Martin Wattenberg, and Jonathan Feinberg. 2009. Participatory visualization with wordle. IEEE transactions on visualization and computer graphics 15, 6 (2009).
[37]
Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 10 (1990), 1550--1560.
[38]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144 (2016).
[39]
Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. 2016. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727 (2016).
[40]
Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of NAACL-HLT. 380--386.
[41]
Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016. Candidate re-ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. 256--266.
[42]
Ying Zhang, Stephan Vogel, and Alex Waibel. 2004. Interpreting bleu/nist scores: How much improvement do we need to have a better system?. In LREC.

Cited By

View all
  • (2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/370232528:1(1-28)Online publication date: 2-Nov-2024
  • (2024)"Math is a pain!": Understanding challenges and needs of the Machine Learning community on Stack OverflowProceedings of the ACM on Human-Computer Interaction10.1145/36869908:CSCW2(1-35)Online publication date: 8-Nov-2024
  • (2024)Towards Practical Homomorphic Aggregation in Byzantine-Resilient Distributed LearningProceedings of the 25th International Middleware Conference10.1145/3652892.3700783(431-444)Online publication date: 2-Dec-2024
  • Show More Cited By

Index Terms

  1. By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 1, Issue CSCW
      November 2017
      2095 pages
      EISSN:2573-0142
      DOI:10.1145/3171581
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 December 2017
      Published in PACMHCI Volume 1, Issue CSCW

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Q&A sites
      2. collaborative editing
      3. deep learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)40
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/370232528:1(1-28)Online publication date: 2-Nov-2024
      • (2024)"Math is a pain!": Understanding challenges and needs of the Machine Learning community on Stack OverflowProceedings of the ACM on Human-Computer Interaction10.1145/36869908:CSCW2(1-35)Online publication date: 8-Nov-2024
      • (2024)Towards Practical Homomorphic Aggregation in Byzantine-Resilient Distributed LearningProceedings of the 25th International Middleware Conference10.1145/3652892.3700783(431-444)Online publication date: 2-Dec-2024
      • (2024)FedSecurity: A Benchmark for Attacks and Defenses in Federated Learning and Federated LLMsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671545(5070-5081)Online publication date: 25-Aug-2024
      • (2024)Breaking State-of-the-Art Poisoning Defenses to Federated Learning: An Optimization-Based Attack FrameworkProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679566(2930-2939)Online publication date: 21-Oct-2024
      • (2024)Byzantine Machine Learning: A PrimerACM Computing Surveys10.1145/361653756:7(1-39)Online publication date: 9-Apr-2024
      • (2024)Robust Federated Learning with Realistic CorruptionWeb and Big Data10.1007/978-981-97-7241-4_15(228-242)Online publication date: 31-Aug-2024
      • (2023)H-nobsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667590(33838-33855)Online publication date: 10-Dec-2023
      • (2023)Byzantine-tolerant methods for distributed variational inequalitiesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667357(28393-28461)Online publication date: 10-Dec-2023
      • (2023)Robust collaborative learning with linear gradient overheadProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618799(9761-9813)Online publication date: 23-Jul-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media