Discourse in Multimedia: A Case Study in Information Extraction

Sachan, Mrinmaya; Dubey, Kumar Avinava; Hovy, Eduard H.; Mitchell, Tom M.; Roth, Dan; Xing, Eric P.

Computer Science > Computation and Language

arXiv:1811.05546 (cs)

[Submitted on 13 Nov 2018]

Title:Discourse in Multimedia: A Case Study in Information Extraction

Authors:Mrinmaya Sachan, Kumar Avinava Dubey, Eduard H. Hovy, Tom M. Mitchell, Dan Roth, Eric P. Xing

View PDF

Abstract:To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features which can be leveraged for various NLP tasks. In this paper, we study some of these discourse features in multimedia text and what communicative function they fulfil in the context. We examine how these multimedia discourse features can be used to improve an information extraction system. We show that the discourse and text layout features provide information that is complementary to lexical semantic information commonly used for information extraction. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We show that the harvested structured knowledge can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1811.05546 [cs.CL]
	(or arXiv:1811.05546v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1811.05546

Submission history

From: Mrinmaya Sachan [view email]
[v1] Tue, 13 Nov 2018 22:08:39 UTC (1,930 KB)

Computer Science > Computation and Language

Title:Discourse in Multimedia: A Case Study in Information Extraction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Discourse in Multimedia: A Case Study in Information Extraction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators