Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2766462.2767799acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Time-Aware Authorship Attribution for Short Text Streams

Published: 09 August 2015 Publication History

Abstract

Identifying authors of short texts on Internet or social media based communication systems is an important tool against fraud and cybercrimes. Besides the challenges raised by the limited length of these short messages, evolving language and writing styles of authors of these texts makes authorship attribution difficult. Most current short text authorship attribution approaches only address the challenge of limited text length. However, neglecting the second challenge may lead to poor performance of authorship attribution for authors who change their writing styles.
In this paper, we analyse the temporal changes of word usage by authors of tweets and emails and based on this analysis we propose an approach to estimate the dynamicity of authors' word usage. The proposed approach is inspired by time-aware language models and can be employed in any time-unaware authorship attribution method. Our experiments on Tweets and the Enron email dataset show that the proposed time-aware authorship attribution approach significantly outperforms baselines that neglect the dynamicity of authors.

References

[1]
G. Frantzeskou, E. Stamatatos, S. Gritzalis, C. E. Chaski, and B. S. Howald. Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method. IJDE, 6(1), 2007.
[2]
N. Kanhabua and K. Nørvåg. A comparison of time-aware ranking methods. SIGIR '11, pages 1257--1258, 2011.
[3]
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML'04, pages 217--226, 2004.
[4]
M. Koppel, J. Schler, and S. Argamon. Authorship attribution in the wild. LREC, 45(1):83--94, 2011.
[5]
M. Koppel, J. Schler, S. Argamon, and E. Messeri. Authorship attribution with thousands of candidate authors. SIGIR '06, pages 659--660, 2006.
[6]
M. Koppel and Y. Winter. Determining if two documents are written by the same author. JASIST, 65(1):178--187, 2014.
[7]
I. Lancashire and G. Hirst. Vocabulary changes in agatha christie's mysteries as an indication of dementia: A case study. In 19th Annual Rotman Research Institute Conference Cognitive Aging: Research and Practice, pages 1--5, 2009.
[8]
R. Layton, P. Watters, and R. Dazeley. Authorship attribution for twitter in 140 characters or less. In Cybercrime and Trustworthy Computing Workshop (CTC), pages 1--8, 2010.
[9]
X. Li and W. B. Croft. Time-based language models. CIKM '03, pages 469--475, 2003.
[10]
R. Schwartz, O. Tsur, A. Rappoport, and M. Koppel. Authorship attribution of micro-messages. In EMNLP'13, pages 1880--1891, 2013.
[11]
R. S. Silva, G. Laboreiro, L. Sarmento, T. Grant, E. Oliveira, and B. Maia. Twazn me: Automatic authorship analysis of micro-blogging messages. NLDB'11, pages 161--168, 2011.
[12]
E. Stamatatos. A survey of modern authorship attribution methods. JASIST, 60(3):538--556, 2009.
[13]
M. van Dam and C. Hauff. Large-scale author verification: Temporal and topical influences. SIGIR '14, pages 1039--1042, 2014.
[14]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179--214, 2004.

Cited By

View all
  • (2024)Authorship Identification System Using Word2Vec Word Embedding Model2024 IEEE Conference on Computer Applications (ICCA)10.1109/ICCA62361.2024.10533018(1-9)Online publication date: 16-Mar-2024
  • (2022)Authorship Attribution with Temporal Data in RedditProceedings of the XVIII Brazilian Symposium on Information Systems10.1145/3535511.3535515(1-8)Online publication date: 16-May-2022
  • (2021)Writer Identification Using Microblogging Texts for Social Media ForensicsIEEE Transactions on Biometrics, Behavior, and Identity Science10.1109/TBIOM.2021.30780733:3(405-426)Online publication date: Jul-2021
  • Show More Cited By

Index Terms

  1. Time-Aware Authorship Attribution for Short Text Streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
    August 2015
    1198 pages
    ISBN:9781450336215
    DOI:10.1145/2766462
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. authorship attribution
    2. short text analysis
    3. time-aware language models

    Qualifiers

    • Short-paper

    Funding Sources

    • European Community's Seventh Framework Program
    • ExPoSe project Netherlands Organization for Scientific Research
    • DiLiPaD project Netherlands Organization for Scientific Research

    Conference

    SIGIR '15
    Sponsor:

    Acceptance Rates

    SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Authorship Identification System Using Word2Vec Word Embedding Model2024 IEEE Conference on Computer Applications (ICCA)10.1109/ICCA62361.2024.10533018(1-9)Online publication date: 16-Mar-2024
    • (2022)Authorship Attribution with Temporal Data in RedditProceedings of the XVIII Brazilian Symposium on Information Systems10.1145/3535511.3535515(1-8)Online publication date: 16-May-2022
    • (2021)Writer Identification Using Microblogging Texts for Social Media ForensicsIEEE Transactions on Biometrics, Behavior, and Identity Science10.1109/TBIOM.2021.30780733:3(405-426)Online publication date: Jul-2021
    • (2021)Can you fool AI by doing a 180? — A case study on authorship analysis of texts by Arata OsadaInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10264458:5Online publication date: 1-Sep-2021
    • (2020)Software-Based Approach towards Automated Authorship Acknowledgement—Chi-Square Test on One Consonant GroupElectronics10.3390/electronics90711389:7(1138)Online publication date: 13-Jul-2020
    • (2019)Assessing the Applicability of Authorship Verification MethodsProceedings of the 14th International Conference on Availability, Reliability and Security10.1145/3339252.3340508(1-10)Online publication date: 26-Aug-2019
    • (2019)Authorship Identification with Multi Sequence Word Selection MethodIntelligent Systems Design and Applications10.1007/978-3-030-16657-1_61(653-661)Online publication date: 12-Apr-2019
    • (2018)Authorship Attribution for Online Social MediaSocial Network Analytics for Contemporary Business Organizations10.4018/978-1-5225-5097-6.ch008(141-165)Online publication date: 2018
    • (2018)To Clean or Not to CleanJournal of Data and Information Quality10.1145/324218010:4(1-25)Online publication date: 29-Oct-2018
    • (2018)Feature Selection in Time Aware Authorship Attribution2018 International Conference On Advances in Communication and Computing Technology (ICACCT)10.1109/ICACCT.2018.8529502(534-537)Online publication date: Feb-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media