Representation of texts as complex networks: a mesoscopic approach

de Arruda, Henrique F.; Silva, Filipi N.; Marinho, Vanessa Q.; Amancio, Diego R.; Costa, Luciano da F.

doi:10.1093/comnet/cnx023

Computer Science > Computation and Language

arXiv:1606.09636 (cs)

[Submitted on 30 Jun 2016 (v1), last revised 25 Feb 2017 (this version, v2)]

Title:Representation of texts as complex networks: a mesoscopic approach

Authors:Henrique F. de Arruda, Filipi N. Silva, Vanessa Q. Marinho, Diego R. Amancio, Luciano da F. Costa

View PDF

Abstract:Statistical techniques that analyze texts, referred to as text analytics, have departed from the use of simple word count statistics towards a new paradigm. Text mining now hinges on a more sophisticated set of methods, including the representations in terms of complex networks. While well-established word-adjacency (co-occurrence) methods successfully grasp syntactical features of written texts, they are unable to represent important aspects of textual data, such as its topical structure, i.e. the sequence of subjects developing at a mesoscopic level along the text. Such aspects are often overlooked by current methodologies. In order to grasp the mesoscopic characteristics of semantical content in written texts, we devised a network model which is able to analyze documents in a multi-scale fashion. In the proposed model, a limited amount of adjacent paragraphs are represented as nodes, which are connected whenever they share a minimum semantical content. To illustrate the capabilities of our model, we present, as a case example, a qualitative analysis of "Alice's Adventures in Wonderland". We show that the mesoscopic structure of a document, modeled as a network, reveals many semantic traits of texts. Such an approach paves the way to a myriad of semantic-based applications. In addition, our approach is illustrated in a machine learning context, in which texts are classified among real texts and randomized instances.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1606.09636 [cs.CL]
	(or arXiv:1606.09636v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1606.09636
Journal reference:	Journal of Complex Networks 6(1), 125-144, 2018
Related DOI:	https://doi.org/10.1093/comnet/cnx023

Submission history

From: Diego Amancio Dr. [view email]
[v1] Thu, 30 Jun 2016 19:47:17 UTC (846 KB)
[v2] Sat, 25 Feb 2017 00:06:48 UTC (1,508 KB)

Computer Science > Computation and Language

Title:Representation of texts as complex networks: a mesoscopic approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Representation of texts as complex networks: a mesoscopic approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators