Jorge Morato
Universidad Carlos III de Madrid, Computer Science, Faculty Member
- Information Extraction, Knowledge Organization Systems, Semantic Web technology - Ontologies, Natural Language Processing, Information Retrieval, Search Engine Optimisation, and 23 moreArtificial Intelligence, Metadata, Web search, Computer Science, Research Methodology, Software Engineering, Ontology, Software Development, Curricula, Automatic Thesaurus Construction, Traceability, Requirements, Thesaurus, Software Reuse, Music Information Retrieval, Education for Library and Information Science, Crowdsourcing, Computer Science Education, IR curricula, Similarity of Symbolic Music, Music retrieval, UML retrieval, and New Mediaedit
- I am currently a professor of Information Science in the Department of Computer Science at the Carlos III University ... moreI am currently a professor of Information Science in the Department of Computer Science at the Carlos III University of Madrid (Spain). I obtained a PhD in Library Science from the Carlos III University in 1999 on the subject of Knowledge Information Systems and its relationships with linguistics. From 1991-1999, I had grants and contracts from the Spanish National Research Council.My current research activity is centered on text mining, information extraction and pattern recognition, NLP, information retrieval, Web positioning, and Knowledge Organization Systems. I have published mainly on semi-automatic construction of thesauri and ontologies, topic maps, and conceptual and contextualized retrieval of semantic documents. My last projects deal with Semantic Metadata Interoperability and Search (SEMSE) and LIS training.I have taught courses on Information Retrieval, Search Engine Optimization, Software Engineering, and Knowledge Modelling Techniques and Management Systems. Some of them are published in the OpenCourseWare projectedit
Se presenta una visión integradora de las diferentes herramientas que permiten el estudio de las conexiones entre documentos, las pautas de publicación, la representación del contenido y la optimización de la recuperación. Se entremezclan... more
Se presenta una visión integradora de las diferentes herramientas que permiten el estudio de las conexiones entre documentos, las pautas de publicación, la representación del contenido y la optimización de la recuperación. Se entremezclan conceptos de Psicología Cognitiva,Lingüística, Cienciometría, Documentación, Estadística, Clasificación e Informática en sus vertientes
más relacionadas con el tratamiento, organización y caracterización de información textual.
El objetivo final es analizar la influencia que tiene el análisis de
género en la carcterización de los parámetros cualitativos y cuantitativos, y en concreto, de las herramientas que se encargan tradicionalmente de estos estudios, como los indicadores cienciométricos y la clasificación de términos.
más relacionadas con el tratamiento, organización y caracterización de información textual.
El objetivo final es analizar la influencia que tiene el análisis de
género en la carcterización de los parámetros cualitativos y cuantitativos, y en concreto, de las herramientas que se encargan tradicionalmente de estos estudios, como los indicadores cienciométricos y la clasificación de términos.
Research Interests:
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for... more
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extr...
Research Interests:
New technologies require an upgrade of LIS professionals’ skills. This study analyzes changes in the field and suggests guidelines to improve LIS curriculum. We have carried out a content analysis of 20 curricula from LIS educational... more
New technologies require an upgrade of LIS professionals’ skills. This study analyzes changes in the field and suggests guidelines to improve LIS curriculum. We have carried out a content analysis of 20 curricula from LIS educational programs to identify terms associated with technological skills. In addition, we identified terms related to new competences from 735 job openings published on generic web sites and 170 on specific web sites. These terms include marketing, management, and content management, mainly related to web applications. The results confirm a positive trend in the demand for technological skills related to LIS. Nevertheless, there is a need to increase awareness about the competences and abilities of LIS professionals, because many relevant job openings are listed in other job categories. Therefore, we have collected a list of key technological skills. In conclusion, it is clear that computer science and the Internet are bringing new opportunities to LIS professio...
"La automatización del proceso de adquisición de conocimiento a partir de documentos textuales en formato electrónico conlleva múltiples dificultades. Una de esas dificultades es el tratamiento textual en si mismo por la... more
"La automatización del proceso de adquisición de conocimiento a partir de documentos textuales en formato electrónico conlleva múltiples dificultades. Una de esas dificultades es el tratamiento textual en si mismo por la diversidad de formatos o su ausencia. Así una metodología idónea deberá marcar bajo que pautas estos documentos serán tratados. Por otra parte, un procesamiento lingüístico potente es necesario para salvar estas dificultades. Se ha mostrado una metodología que está demostrando ser eficiente en distintos dominios y aplicaciones. La automatización completa por el momento no ha resultado factible. Existen en estudio múltiples aplicaciones a Vigilancia tecnológica y Reutilización de Software. Según los experimentos desarrollados, la metodología idónea debe estar basada en una clasificación facetada que facilite la reutilización e interoperabilidad de estos sistemas de sistemas de organización del conocimiento. Todas las experiencias ensayadas indican que las clasificaciones propuestas, tras la aplicación de las herramientas informáticas al análisis de un dominio, deberán ser valoradas y validadas por un experto. Por el momento una automatización totalmente ajena a la intervención humana no parece ser realista. Como futuro desarrollo se deberá resolver los problemas asociadas a la validación por los expertos en el dominio, ya que provoca demoras e inconsistencias en la obtención del dominio. En conclusión, las tareas asignadas a los expertos deberán ser breves y sencillas para que la metodología sea efectiva, o bien ser simplificadas mediante minería de datos."
Research Interests:
Título: Estructuración y clasificación automática de información: aplicación a una colección de textos médicos. Autores: Morato, J. Revista: Revista Interamericana de Bibliotecología, 2001 ENE-JUN; 24 (1) Página(s): 117-136 ISSN: 01200976... more
Título: Estructuración y clasificación automática de información: aplicación a una colección de textos médicos. Autores: Morato, J. Revista: Revista Interamericana de Bibliotecología, 2001 ENE-JUN; 24 (1) Página(s): 117-136 ISSN: 01200976 Resumen: Se describe una ...
Research Interests:
Research Interests:
Abstract. The suitability of the algorithms for recognition and classification of entities (NERC) is evaluated through competitions such as MUC, CONLL or ACE. In general, these competitions are limited to the recognition of predefined... more
Abstract. The suitability of the algorithms for recognition and classification of entities (NERC) is evaluated through competitions such as MUC, CONLL or ACE. In general, these competitions are limited to the recognition of predefined entity types in certain languages. ...
Research Interests:
Abstract. We describe a new approach for computing the similarity between symbolic musical pieces, based on the differences in shape between the interpolating curves defining the pieces. We outline several requirements for a symbolic... more
Abstract. We describe a new approach for computing the similarity between symbolic musical pieces, based on the differences in shape between the interpolating curves defining the pieces. We outline several requirements for a symbolic musical similarity system, and ...
Research Interests:
ABSTRACT In this article we perform a qualitative analysis of well-known generic ontologies according to their retrieval potential in order to implement a conceptual retrieval system. This retrieval system aims to search metadata schemes.... more
ABSTRACT In this article we perform a qualitative analysis of well-known generic ontologies according to their retrieval potential in order to implement a conceptual retrieval system. This retrieval system aims to search metadata schemes. The main problem in the implementation of retrieval system has been finding a reference ontology covering the domain and matching the system's requirements. We performed an evaluation of ontologies' characteristics for their suitability to represent the semantic of specific metadata scheme. Finally, PROTON has been selected as ontology due to its extensibility, adaptability to the domain, adequacy to the retrieval system and its availability. The principal contribution of this study is the provision of guidelines towards the selection of ontologies to be mapped, based on qualitative analysis and experience.
Research Interests:
Research Interests:
The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of... more
The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of the techniques used in this domain, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources.
Research Interests:
Se presenta el desarrollo de una aplicación móvil de bajo costo para una red de bibliotecas. Existe una necesidad de aplicaciones que proporcionen servicio a los usuarios de las bibliotecas de acuerdo con los usos actuales, fidelizando a... more
Se presenta el desarrollo de una aplicación móvil de bajo costo para una red de bibliotecas. Existe una necesidad de aplicaciones que proporcionen servicio a los usuarios de las bibliotecas de acuerdo con los usos actuales, fidelizando a los usuarios y simplificando el acceso recurrente a múltiples sitios web. Se utiliza la Red Valenciana de Lectura Pública como estudio de caso para ilustrar la propuesta. El punto de partida es un proceso analítico relativo a las características de la entidad y los requisitos de la aplicación móvil. Para el desarrollo de la aplicación se comparan diferentes plataformas para la construcción de aplicaciones móviles. A continuación, se evalúa el producto final en relación con la eficiencia y la facilidad de uso. Los resultados indican que la utilización de una aplicación, que integra la información en un único punto, mejora el rendimiento en términos de tiempo de búsqueda y tasa de error. La principal contribución de este trabajo destaca las apps como ...
Research Interests:
Introduction. The objective of this study is understand the information needs that businesses have while seeking Library and Information Science professionals and analyse how they formulate those needs. Method. The analysis is performed... more
Introduction. The objective of this study is understand the information needs that businesses have while seeking Library and Information Science professionals and analyse how they formulate those needs. Method. The analysis is performed by examining the professional skills and capabilities demanded in job offers published. A total of 1,020 job offers collected from a Spanish employment agency Website have been analysed for the period between 2006 and 2008. Analysis. Knowledge representation techniques using thesauri have been used for the automatic content analysis based on natural language processing. Data extracted from the corpora have been analysed statistically. Results. Results of this study indicate a demand for skills related to technological advances and the management of electronic resources as well as to technical aspects associated with the Informatics domain. The knowledge of languages and the possession of an academic title represent essential factors in the job offers...
Research Interests:
Open source software is becoming more popular worldwide due to the quality of its products. Open source repositories are tools to access this kind of software, but when it comes to search any particular component, it is not easy to find... more
Open source software is becoming more popular worldwide due to the quality of its products. Open source repositories are tools to access this kind of software, but when it comes to search any particular component, it is not easy to find what is required quickly. This paper studies the feasibility of using other sorting algorithms, in order to improve the results provided by open source software repositories; for this purpose the use of sorting algorithms based on graphs of relationships between open source software projects is analyzed. The results of four different sorting algorithms have been compared with the opinion of a group of experts in the domain area where the experiment was conducted. The results show that there are slight discrepancies between the ranking provided by the open source repository, sorting algorithms and expert opinion. These results underscore the possibility of including new sorting techniques in open source repositories in order to obtain better results i...
Research Interests:
ABSTRACT
Research Interests:
In this article we perform a comparison between two approaches to the modeling of the hierarchical structure of the real world: on the one hand, generic and whole-part relationships in a descriptors thesaurus; on the other hand,... more
In this article we perform a comparison between two approaches to the modeling of the hierarchical structure of the real world: on the one hand, generic and whole-part relationships in a descriptors thesaurus; on the other hand, generalization and aggregation relationships in UML. Trying to shorten the distance between them leads to a new metamodel of relationships that can reflect better the mental habits of modelers when dealing with hierarchical trees.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
At the moment, organizations are used to transforming in a continuous way and one of the main changes is technology; it is needed to develop new systems that help old systems to evolve. The change brings with it an intrinsic study and... more
At the moment, organizations are used to transforming in a continuous way and one of the main changes is technology; it is needed to develop new systems that help old systems to evolve. The change brings with it an intrinsic study and reuse of databases, its design must be assumed by software developers, they need to study old database designs
Research Interests: Information Retrieval, Informatics, Software Development, Productivity, Software Reuse, and 15 moreDatabase Management Systems, Programming, Software Quality, Database Design, Data Bases, Indexing, Design Methodology, Structural Transformation, Database Retrieval, Software Systems, Indexation, Relational databases, Software Development Process, Focal Point, and Application Software
Purpose – This paper seeks to analyze and evaluate different types of semantic web retrieval systems, with respect to their ability to manage and retrieve semantic documents. Design/methodology/approach – The authors provide a brief... more
Purpose – This paper seeks to analyze and evaluate different types of semantic web retrieval systems, with respect to their ability to manage and retrieve semantic documents. Design/methodology/approach – The authors provide a brief overview of knowledge modeling and semantic retrieval systems in order to identify their major problems. They classify a set of characteristics to evaluate the management of semantic documents. For doing the same the authors select 12 retrieval systems classified according to these features. The evaluation methodology followed in this work is the one that has been used in the Desmet project for the evaluation of qualitative characteristics. Findings – A review of the literature has shown deficiencies in the current state of the semantic web to cope with known problems. Additionally, the way semantic retrieval systems are implemented shows discrepancies in their implementation. The authors analyze the presence of a set of functionalities in different type...
Research Interests:
Research Interests:
The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of... more
The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of the techniques used in this domain, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources.
Research Interests:
Research Interests:
Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible... more
Researchers in indexing and retrieval systems have been advocating the inclusion of more contextual information to improve results. The proliferation of full-text databases and advances in computer storage capacity have made it possible to carry out text analysis by means of linguistic ...
Research Interests: Information Systems, Discourse Analysis, Algorithms, Information Retrieval, Biomedicine, and 15 moreComputational Linguistics, Gender Discourse, Corpus Linguistics and Discourse Analysis, Information Structure, Computational linguistic phylogenetics, Information Processing, Classification, Indexing, Information Analysis, Document Structure, K Means, Filtering, Context analysis, Classification Algorithm, and Contextual Information
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
En este art culo se presenta un m todo para consensuar la sem ntica de los conceptos de los vocabularios de metadatos. El an lisis conceptual de mapeados y alineamientos de ontolog as ha permitido dise ar un sistema para consensuar sem... more
En este art culo se presenta un m todo para consensuar la sem ntica de los conceptos de los vocabularios de metadatos. El an lisis conceptual de mapeados y alineamientos de ontolog as ha permitido dise ar un sistema para consensuar sem ntica de vocabularios de ...
Research Interests:
This short paper describes our five submissions to the 2012 edition of the MIREX Symbolic Melodic Similarity task. All five submissions rely on a geometric model that represents melodies as spline curves in the pitch-time plane. The... more
This short paper describes our five submissions to the 2012 edition of the MIREX Symbolic Melodic Similarity task. All five submissions rely on a geometric model that represents melodies as spline curves in the pitch-time plane. The similarity between two melodies is then computed with a sequence alignment algorithm between sequences of spline spans: the more similar the shape of the curves, the more similar the melodies they represent. As in MIREX 2010 and 2011, our systems ranked first for all effectiveness measures used. However, this year there was only one competing system, so we employ this report mainly to describe and compare results within our systems.
Research Interests:
Abstract: The third Information Retrieval Education through EXperimentation track (EIREX 2012) was run at the University Carlos III of Madrid, during the 2012 spring semester. EIREX 2012 is the third in a series of experiments designed to... more
Abstract: The third Information Retrieval Education through EXperimentation track (EIREX 2012) was run at the University Carlos III of Madrid, during the 2012 spring semester. EIREX 2012 is the third in a series of experiments designed to foster new Information Retrieval (IR) education methodologies and resources, with the specific goal of teaching undergraduate IR courses from an experimental perspective. For an introduction to the motivation behind the EIREX experiments, see the first sections of [Urbano et al., 2011a]. For information on ...
Research Interests:
This short paper describes our three submissions to the MIREX 2011 Symbolic Melodic Similarity task. All three submissions rely on a geometric model that represents melodies as spline curves in the pitch-time plane. The similarity between... more
This short paper describes our three submissions to the MIREX 2011 Symbolic Melodic Similarity task. All three submissions rely on a geometric model that represents melodies as spline curves in the pitch-time plane. The similarity between two melodies is then computed with a sequence alignment algorithm between sequences of spline spans: the more similar the shape of the curves, the more similar the melodies they represent. As in MIREX 2010, our systems ranked first for all effectiveness measures used.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
El objetivo del GTI es la generación semiautomática de tesauros mediante el análisis de un corpus. Tras ensayar distintos métodos de clasificación de la información, desde co-ocurrencia de términos a redes neuronales, se mostró necesaria... more
El objetivo del GTI es la generación semiautomática de tesauros mediante el análisis de un corpus. Tras ensayar distintos métodos de clasificación de la información, desde co-ocurrencia de términos a redes neuronales, se mostró necesaria la creación de nuevos indicadores ...
Research Interests:
Research Interests:
Resumen: Las ontologías son una pieza clave para el desarrollo de la Web Semántica. La irrupción de las ontologías en Internet es un fenómeno reciente pero de trascendental importancia para la transmisión y almacenamiento de datos en el... more
Resumen: Las ontologías son una pieza clave para el desarrollo de la Web Semántica. La irrupción de las ontologías en Internet es un fenómeno reciente pero de trascendental importancia para la transmisión y almacenamiento de datos en el ámbito tecnológico y empresarial. La ...
Research Interests:
The development of the Semantic Web depends on agreed and unambiguous knowledge representations, on the availability and accessibility of knowledge, as well as on retrieval capabilities. The scarce agreement on knowledge representation... more
The development of the Semantic Web depends on agreed and unambiguous knowledge representations, on the availability and accessibility of knowledge, as well as on retrieval capabilities. The scarce agreement on knowledge representation and the lack of ...