Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2488388.2488419acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Attributing authorship of revisioned content

Published: 13 May 2013 Publication History

Abstract

A considerable portion of web content, from wikis to collaboratively edited documents, to code posted online, is revisioned. We consider the problem of attributing authorship to such revisioned content, and we develop scalable attribution algorithms that can be applied to very large bodies of revisioned content, such as the English Wikipedia.
Since content can be deleted, only to be later re-inserted, we introduce a notion of authorship that requires comparing each new revision with the entire set of past revisions. For each portion of content in the newest revision, we search the entire history for content matches that are statistically unlikely to occur spontaneously, thus denoting common origin. We use these matches to compute the earliest possible attribution of each word (or each token) of the new content. We show that this "earliest plausible attribution" can be computed efficiently via compact summaries of the past revision history. This leads to an algorithm that runs in time proportional to the sum of the size of the most recent revision, and the total amount of change (edit work) in the revision history. This amount of change is typically much smaller than the total size of all past revisions. The resulting algorithm can scale to very large repositories of revisioned content, as we show via experimental data over the English Wikipedia.

References

[1]
B. Adler and L. de Alfaro. A content-driven reputation system for the Wikipedia. In WWW 2007, Proc. of the 16th Intl. World Wide Web Conference. ACM Press, 2007.
[2]
B. Adler, L. de Alfaro, I. Pye, and V. Raman. Measuring author contributions to the Wikipedia. In WikiSym: International Symposium on Wikis, 2008.
[3]
B. T. Adler. WikiTrust: Content-Driven Reputation for the Wikipedia. PhD thesis, UC Santa Cruz, 2012.
[4]
P. Buneman, S. Khanna, and T. Wang-Chiew. Data provenance: Some basic issues. In FST TCS 2000, Lect. Notes in Comp. Sci., pages 87--93. Springer-Verlag, 2000.
[5]
P. Buneman, S. Khanna, and T. Wang-Chiew. Why and where: A characterization of data provenance. In ICDT 2001: Intl. Conf. on Database Theory, volume 1973 of Lect. Notes in Comp. Sci., pages 316--330. Springer-Verlag, 2001.
[6]
F. Flock and A. Rodchenko. Whose article is it anyway? | Detecting authorship distribution in wikipedia articles over time with WIKIGINI. In Proceedings of the Wikipedia Academy. Online Publication, 2012.
[7]
A. Forte and A. Bruckman. Why do people write for the Wikipedia? Incentives to contribute to open-content publishing. In SIGGROUP 2005 Workshop: Sustaining Community, 2005.
[8]
J. Freire, D. Koop, E. Santos, and C. Silva. Provenance for computational tasks: A survey. Computing in Science and Engineering, 10(3), 2008.
[9]
D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.
[10]
E. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23:262--272, 1976.
[11]
L. Moreau, P. Groth, S. Miles, J. Vazquez-Salceda, J. Ibbotson, S. Jiang, S. Munroe, O. Rana, A. Schreiber, V. Tan, and L. Varga. The provenance of electronic data. Communications of the ACM, 51(4), 2008.
[12]
O. Nov. What motivates wikipedians? Comm. ACM, 50(11):60--64, 2007.
[13]
Y. Simmhan, B. Plale, and D. Gannon. A survey of data provenance in e-Science. ACM SIGMOD Record, 34(3), 2005.
[14]
W. Tichy. The string-to-string correction problem with block move. ACM Trans. on Computer Systems, 2(4), 1984.
[15]
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249--260, 1995.
[16]
P. Weiner. Linear pattern matching algorithms. In Proc. of the 14th IEEE Symp. on Switching and Automata Theory, pages 1--11, 1973.

Cited By

View all
  • (2018)Mind Your POVProceedings of the ACM on Human-Computer Interaction10.1145/32744062:CSCW(1-23)Online publication date: 1-Nov-2018
  • (2018)Spacetime Characterization of Real-Time Collaborative EditingProceedings of the ACM on Human-Computer Interaction10.1145/32743102:CSCW(1-19)Online publication date: 1-Nov-2018
  • (2017)Measuring contribution in collaborative writing: An adaptive NMF topic modelling approach2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG)10.1109/ICEDEG.2017.7962514(63-70)Online publication date: Apr-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '13: Proceedings of the 22nd international conference on World Wide Web
May 2013
1628 pages
ISBN:9781450320351
DOI:10.1145/2488388

Sponsors

  • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
  • CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. authorship
  2. revisioned content
  3. wikipedia

Qualifiers

  • Research-article

Conference

WWW '13
Sponsor:
  • NICBR
  • CGIBR
WWW '13: 22nd International World Wide Web Conference
May 13 - 17, 2013
Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Mind Your POVProceedings of the ACM on Human-Computer Interaction10.1145/32744062:CSCW(1-23)Online publication date: 1-Nov-2018
  • (2018)Spacetime Characterization of Real-Time Collaborative EditingProceedings of the ACM on Human-Computer Interaction10.1145/32743102:CSCW(1-19)Online publication date: 1-Nov-2018
  • (2017)Measuring contribution in collaborative writing: An adaptive NMF topic modelling approach2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG)10.1109/ICEDEG.2017.7962514(63-70)Online publication date: Apr-2017
  • (2014)WikiWhoProceedings of the 23rd international conference on World wide web10.1145/2566486.2568026(843-854)Online publication date: 7-Apr-2014

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media