Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Zhu, Jian; Jurgens, David

Computer Science > Computation and Language

arXiv:2109.03158 (cs)

[Submitted on 7 Sep 2021 (v1), last revised 10 Sep 2021 (this version, v3)]

Title:Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Authors:Jian Zhu, David Jurgens

View PDF

Abstract:An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring inter- and intra-author variation, showing that variation in idiolects is often distinctive yet consistent.

Comments:	EMNLP 2021 main conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2109.03158 [cs.CL]
	(or arXiv:2109.03158v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.03158

Submission history

From: Jian Zhu [view email]
[v1] Tue, 7 Sep 2021 15:49:23 UTC (5,586 KB)
[v2] Wed, 8 Sep 2021 22:10:06 UTC (5,584 KB)
[v3] Fri, 10 Sep 2021 13:06:46 UTC (5,584 KB)

Computer Science > Computation and Language

Title:Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators