LIPIcs.COSIT.2024.11.pdf
- Filesize: 1.45 MB
- 20 pages
Extracting spatial details from historical texts can be difficult, hindering our understanding of past landscapes. The study addresses this challenge by analyzing the Corpus of the Lake District Writing, focusing on the English Lake District region. We systematically link the theoretical notions from the core concepts of spatial information to provide basis for the problem domain. The conceptual foundation is further complemented with a spatial ontology and a custom gazetteer, allowing a formal and insightful semantic exploration of the massive unstructured corpus. The other contrasting side of the framework is the usage of LLMs for spatial relation extraction. We formulate prompts leveraging understanding of the LLMs of the intended task, curate a list of spatial relations representing the most recurring proximity or vicinity relations terms and extract semantic triples for the top five place names appearing in the corpus. We compare the extraction capabilities of three benchmark LLMs for a scholarly significant historical archive, representing their potential in a challenging and interdisciplinary research problem. Finally, the network comprising the semantic triples is enhanced by incorporating a gazetteer-based classification of the objects involved thus improving their spatial profiling.
Feedback for Dagstuhl Publishing