Labelled network subgraphs reveal stylistic subtleties in written texts

Marinho, Vanessa Q.; Hirst, Graeme; Amancio, Diego R.

doi:10.1093/comnet/cnx047

Computer Science > Computation and Language

arXiv:1705.00545 (cs)

[Submitted on 1 May 2017 (v1), last revised 8 Nov 2017 (this version, v3)]

Title:Labelled network subgraphs reveal stylistic subtleties in written texts

Authors:Vanessa Q. Marinho, Graeme Hirst, Diego R. Amancio

View PDF

Abstract:The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called labelled subgraphs, that combines the frequency of common words with small structures found in the topology of the network, known as motifs. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analyzed. To identify translationese, texts from the Canadian Hansard and the European parliament were classified as to original and translated instances. Our results suggest that labelled subgraphs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency, and machine translation.

Comments:	To appear in Journal of Complex Networks (JCN cnx047). The paper is available at this https URL
Subjects:	Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an)
Cite as:	arXiv:1705.00545 [cs.CL]
	(or arXiv:1705.00545v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1705.00545
Related DOI:	https://doi.org/10.1093/comnet/cnx047

Submission history

From: Diego Amancio Dr. [view email]
[v1] Mon, 1 May 2017 14:36:21 UTC (618 KB)
[v2] Tue, 7 Nov 2017 17:09:50 UTC (618 KB)
[v3] Wed, 8 Nov 2017 02:16:49 UTC (618 KB)

Computer Science > Computation and Language

Title:Labelled network subgraphs reveal stylistic subtleties in written texts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Labelled network subgraphs reveal stylistic subtleties in written texts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators