Authors:
Pablo Suarez
;
Valentín Moreno
;
Anabel Fraga
and
Juan Llorens
Affiliation:
Carlos III of Madrid University, Spain
Keyword(s):
Patterns, corpus, lexical analysis, parsing, semantic analysis, tokenization, information reuse, indexing, automatic generation, grammatical categories, RSHP model, natural language processing.
Abstract:
Within the discipline of natural language processing there are diffe-rent approaches to analyze large amounts of text corpus. The identification patterns with semantic elements in a text let us classify and examine the corpus to facilitate interpretation and management of information through computers. This paper proposes the development of a software tool that generates index patterns automatically using various algorithms for lexical, syntactic and semantic analysis of text and integrates the results into other projects in the area of research and other ontological formats. The algorithms in the system implemented various types of analysis in the context of natural language processing, so they can identify grammatical categories and semantic characteristics of words, making up index patterns. The results obtained correspond to a pattern list sorted by frequency of occurrence and take into account intermediate optional elements, which determine its relevance and usefulness to other
projects. The developed system proposes a model of generation and storage of patterns, and a control interface that allows the specification of parameters and running reports.
(More)