The use of software agents as Database Management System components lead to database systems that may be conflgured and extended to support new requirements. We focus here with the self-tuning feature, which demands a somewhat intelligent... more
The use of software agents as Database Management System components lead to database systems that may be conflgured and extended to support new requirements. We focus here with the self-tuning feature, which demands a somewhat intelligent behavior that agents could add to traditional DBMS modules. We propose in this paper an agent-based database architecture to deal with automatic index creation.
A indexação, humana e automática, foi desde sempre uma preocupação da área da ciência da informação (CI). A falta de consistência na indexação humana e de semântica na automática são apontadas como as grandes desvantagens. Uma... more
A indexação, humana e automática, foi desde sempre uma preocupação da área da ciência da informação (CI). A falta de consistência na indexação humana e de semântica na automática são apontadas como as grandes desvantagens. Uma possibilidade de dotar sistemas computacionais com maior capacidade de inferência reside na utilização de ontologias. Desta proposição objetiva-se identificar e analisar os estudos na área da CI que abordem as contribuições das antologias na indexação automática. Pretende-se (i) identificar os trabalhos científicos, nas bases Library & Information Science Source e Library, Information Science & Technology Abstracts que abordem este assunto, a sua distribuição temporal e geográfica; (ii) identificar e descrever a centralidade da abordagem temática aos dois conceitos (indexação automática e ontologias), e a abordagem metodológica dos respetivos artigos; (iii) identificar as contribuições presentes nos artigos que constituem o corpus relativamente às potencialidades da utilização conjunta dos dois conceitos. Partiu-se de um estudo exploratório assente numa revisão sistemática da literatura. Os resultados apontam como contribuições das ontologias na indexação automática: (i) desambiguar termos homógrafos e polissémicos; (ii) maior capacidade de integração de relações semânticas de forma automatizada; (iii) uma navegação e expansão de consultas através de relações semânticas; (iv) uma recuperação mais precisa e exaustiva da informação. Conclui-se que o desenvolvimento de sistemas que utilizam o potencial das ontologias na indexação automática procura contornar a sua falta de capacidade semântica. Apesar dos resultados promissores nesta matéria, infere-se que é ainda prematuro e inadequado falar de uma indexação semântica efetiva.
(see English below) Français : Cette communication présente un aperçu des technologies linguistiques disponibles pour les traitements automatiques ou semi-automatiques de documents dans la gestion documentaire, en particulier pour le... more
(see English below) Français : Cette communication présente un aperçu des technologies linguistiques disponibles pour les traitements automatiques ou semi-automatiques de documents dans la gestion documentaire, en particulier pour le traitement multilingue. Les différentes technologies sont présentées et une évaluation est faite de leur statut (en termes de maturité). En conclusion, on constate que si plusieurs technologies sont relativement efficaces pour traiter les documents en anglais, pour le traitement d'autres langues les possibilités sont beaucoup plus limitées. English: This talk gives an overview of linguistic technologies available for the automatic processing of documents, in the context of information science, especially for multilingual processing. The different technologies are presented and evaluated in terms of how mature each one is. In conclusion, it is noted that although many tools are relatively efficient for handling documents in English, in the handling of other languages the possibilities are rather more limited.
Following the increase of video contents on TV or the Internet, efficient techniques to browse and search video data in an archive are needed. We propose a method to support browsing a news video archive with the help of Wikipedia because... more
Following the increase of video contents on TV or the Internet, efficient techniques to browse and search video data in an archive are needed. We propose a method to support browsing a news video archive with the help of Wikipedia because news videos are important as video contents. First, videos are automatically indexed by Wikipedia entries by means of evaluating
As an innovative and cost-effective method for carrying out multiple-axis CNC machining, -axis CNC machining technique adds an automatic indexing/rotary table with two additional discrete rotations to a regular 3-axis CNC machine, to... more
As an innovative and cost-effective method for carrying out multiple-axis CNC machining, -axis CNC machining technique adds an automatic indexing/rotary table with two additional discrete rotations to a regular 3-axis CNC machine, to improve its ability and efficiency for machining complex sculptured parts. In this work, a new tool path generation method to automatically subdivide a complex sculptured surface into a number of easy-to-machine surface patches; identify the favorable machining set-up/orientation for each patch; and generate effective 3-axis CNC tool paths for each patch is introduced. The method and its advantages are illustrated using an example of sculptured surface machining. The work contributes to automated multiple-axis CNC tool path generation for sculptured part machining and forms a foundation for further research.
The queries defined on data warehouses are complex and use several join operations that induce an expensive computational cost. This cost becomes even more prohibitive when queries access very large volumes of data. To improve response... more
The queries defined on data warehouses are complex and use several join operations that induce an expensive computational cost. This cost becomes even more prohibitive when queries access very large volumes of data. To improve response time, data warehouse administrators generally use indexing techniques such as star join indexes or bitmap join indexes. This task is nevertheless complex and fastidious. Our solution lies in the field of data warehouse auto-administration. In this framework, we propose an automatic index selection strategy. We exploit a data mining technique ; more precisely frequent itemset mining, in order to determine a set of candidate indexes from a given workload. Then, we propose several cost models allowing to create an index configuration composed by the indexes providing the best profit. These models evaluate the cost of accessing data using bitmap join indexes, and the cost of updating and storing these indexes.
The use of software agents as Database Management System components lead to database systems that may be conflgured and extended to support new requirements. We focus here with the self-tuning feature, which demands a somewhat intelligent... more
The use of software agents as Database Management System components lead to database systems that may be conflgured and extended to support new requirements. We focus here with the self-tuning feature, which demands a somewhat intelligent behavior that agents could add to traditional DBMS modules. We propose in this paper an agent-based database architecture to deal with automatic index creation.
Resumo: O objetivo deste estudo é investigar em que medida as pesquisas na Ciência da Informação (CI) tem aproximado às das técnicas de Deep Learning, sendo relacionadas à representação, descrição e recuperação de imagens na Web, e assim,... more
Resumo: O objetivo deste estudo é investigar em que medida as pesquisas na Ciência da Informação (CI) tem aproximado às das técnicas de Deep Learning, sendo relacionadas à representação, descrição e recuperação de imagens na Web, e assim, aferir da mais valia destas pesquisas quando aplicadas aos métodos da área da CI. A partir de uma revisão integrativa de literatura nacional e internacional de modo contextualizado na CI, os documentos recuperados foram analisados conforme os critérios da revisão integrativa, identificando um conjunto de operações que poderiam ser integrados nas metodologias de representação e descrição de imagens desenvolvidas e consolidadas no campo da CI. Conclui-se que ainda há uma lacuna nas pesquisas em CI tanto no âmbito nacional como internacional sobre Deep Learning e que recursos desta nova estrutura de programação podem ser aproximados aos métodos já validados pela área. Palavras-chave: deep learning; indexação de imagens; machine learning; recuperação de imagens na web. Automatic indexing of web images: trends and challenges in deep learning context Abstract: The objective of this study is to investigate the extent to which research in Information Science (IS) has approximated those techniques of the Deep Learning, being related to representation, description and retrieval of images on the Web, and thus, to assess the value of these researches when applied to IS methods. From an integrative review of national and international literature contextualized in the IS, the recovered documents were analyzed according to the criteria of the integrative review, identifying a set of operations that could be attached in the methodologies of representation and description of images developed and consolidated in the field of IS. It is concluded that there is still a gap in research of IS area both at national and international level on Deep Learning and that resources of this new programming structure can be approximated to the methods already validated by the area. Keywords: deep learning; image retrieval on the web; indexing of images; machine learning. Indexación automática de imágenes en la web: tendencias y desafíos en el contexto deep learning
We discuss the nature and the scope of linguistic (morphological,syntactic and semantic) variation of terms and its impact on twoinformation retrieval tasks: term acquisition and automatic indexing. Areview of natural language processing... more
We discuss the nature and the scope of linguistic (morphological,syntactic and semantic) variation of terms and its impact on twoinformation retrieval tasks: term acquisition and automatic indexing. Areview of natural language processing techniques existing in these twoareas is done, along with an in-depth presentation of FASTR, a corpusprocessor for the recognition, normalization, and acquisition of multi-word terms.
Background: Knowledge management in the European project Noesis addresses concept-based annotation and multilingual Information Retrieval of documents. Objective: Multilingual enrichment of a concept-based terminology in the medical... more
Background: Knowledge management in the European project Noesis addresses concept-based annotation and multilingual Information Retrieval of documents. Objective: Multilingual enrichment of a concept-based terminology in the medical field. Experience and evaluation in the domain of cardiovascular diseases by enriching a subset of the MeSH thesaurus in six European languages. This terminology, represented in the OWL standard ontology language, has been used for manual semantic annotation of medical texts, for ...
Information filtering (IF) systems usually filter data items by correlating a vector of terms (keywords) that represent the user profile with similar vectors of terms that represent the data items (eg documents). The terms that represent... more
Information filtering (IF) systems usually filter data items by correlating a vector of terms (keywords) that represent the user profile with similar vectors of terms that represent the data items (eg documents). The terms that represent the data items can be determined by (human) experts (eg authors of documents) or by automatic indexing methods. In this study we employ an artificial neural-network (ANN) as an alternative method for both filtering and term selection, and compare its effectiveness to “traditional” methods. In an earlier ...
Summary of Product Characteristics (SPC) indexing enables to extract all the information needed to analyze a prescription and find some inappropriate medications. We evaluate a French Multi-Terminology Indexer tool (F-MTI) for SPC... more
Summary of Product Characteristics (SPC) indexing enables to extract all the information needed to analyze a prescription and find some inappropriate medications. We evaluate a French Multi-Terminology Indexer tool (F-MTI) for SPC automatic indexing. This tool uses a dictionary containing the textual forms that are likely to appear in natural language text for the drug clinical particular terms contained in the Vidal thesaurus (TUV). We developed a method to automatically generate this dictionary. The evaluation showed a precision of 52.9% and a recall of 46.2%. F-MTI will be integrated in a semi-automatic indexing tool.
Automatic text analysis widened the perspective of work on document contents by opening up the studies on the linguistic productions. In this case, we are using annotation as a case study. In our approach, annotation is defined as... more
Automatic text analysis widened the perspective of work on document contents by opening up the studies on the linguistic productions. In this case, we are using annotation as a case study. In our approach, annotation is defined as textual, graphic or sound information, attached to document source (text, photo, audio sequence or video sequence : multimedia). The source of our corpus is from INA databases (ie. Institut National de l'Audiovisuel, Paris). Our research task consisted of identifying what are the appropriate characteristics of a multimedia document, its context and information retrieval in the context of natural language processing (NLP), automatic indexing and knowledge representation. We discuss the crucial role of annotation process in the Knowledge Extraction tools and Management as well as in the design of Information Retrieval Systems. Our focus is more specifically on the new approach in information system design dedicated to “economic intelligence”.
La notion d'interpretation de l'information a permis d'integrer la complexite relative au processus d'IE, autant d'affiner un renseignement aux “frontieres” des modeles et des problematiques pluridisciplinaire :... more
La notion d'interpretation de l'information a permis d'integrer la complexite relative au processus d'IE, autant d'affiner un renseignement aux “frontieres” des modeles et des problematiques pluridisciplinaire : Systemes d'information (SI) et de Recherche d'information (SRI), Modele utilisateur, information strategique et Intelligence Economique (IE). Cette dimension pluridisciplinaire offre des possibilites de moduler des interactions complexes (besoins, requetes, reponses), le systeme dedie au processus IE et les utilisateurs, qu'ils soient acteur, veilleur ou decideur d'une organisation. Des modeles et outils sont proposes pour la mise en œuvre du processus complexe d'IE, comme le modele EQuA2te pour la gestion et l'exploitation d'une base d'information (ou l'entrepot de donnees d'un domaine), le prototype METIORE pour la gestion des references bibliographiques dans un environnement de recherche cooperative des informati...
Resumo: Esse estudo apresenta o resultado parcial da pesquisa de mestrado intitulada "Utilização de Técnicas de Indexação Automática para a Representação do Conteúdo Semântico de Documentos Acadêmicos" que tem como objetivo avaliar a... more
Resumo: Esse estudo apresenta o resultado parcial da pesquisa de mestrado intitulada "Utilização de Técnicas de Indexação Automática para a Representação do Conteúdo Semântico de Documentos Acadêmicos" que tem como objetivo avaliar a contribuição de técnicas específicas de indexação automática no processo de representação semântica do conteúdo de teses e dissertações. Descrevem-se os processos de Indexação Manual e de Indexação Automática e aborda-se a aplicação dos critérios sintático-semânticos na extração automática de termos relevantes para a representação do conteúdo de documentos acadêmicos. Discutem-se os referenciais teóricos advindos da semântica e da lingüística computacional. Para implementar o processo de indexação automática são apresentados o parser Tropes, para extração automática dos termos; e a Taxonomia da Ciência da Informação elaborada por Hawkins, Larson e Caton, em 2003, como cenário semântico embutido no software. Palavras-chave: Indexação automática. Representação da informação. Semântica. Sintaxe. Lingüística computacional. Abstract: This study presents partial results from the Master´s Degree´s research
CISMeF (acronym for Catalog and Index of French Language Health Resources on the Internet) is a quality-controlled health gateway conceived to catalog and index the most important and quality-controlled sources of institutional health... more
CISMeF (acronym for Catalog and Index of French Language Health Resources on the Internet) is a quality-controlled health gateway conceived to catalog and index the most important and quality-controlled sources of institutional health information in French. The goal of this study is to compare the relevance of results provided by this gateway from a small set of documents selected and described by human experts to those provided by a search engine from a large set of automatically indexed and ranked resources. The Google-Customized search engine (CSE) was used. The evaluation was made using the 10th first results of 15 queries and two blinded physician evaluators. There was no significant difference between the relevance of information retrieval in CISMeF and Google CSE. In conclusion, automatic indexing does not lead to lower relevance than a manual MeSH indexing and may help to cope with the increasing number of references to be indexed in a controlled health quality gateway.
Résumé: La notion d'interprétation de l'information a permis d'intégrer la complexité relative au processus d'IE, autant d'affiner un renseignement aux “frontières” des modèles et des problématiques... more
Résumé: La notion d'interprétation de l'information a permis d'intégrer la complexité relative au processus d'IE, autant d'affiner un renseignement aux “frontières” des modèles et des problématiques pluridisciplinaire: Systèmes d'information (SI) et de Recherche d' ...
Abstract: This paper describes an intelligent forms pro-cessing system (IFPS) which provides capabilities for au-tomatically indexing form documents for storage/retrieval to/from a document library and for capturing information from... more
Abstract: This paper describes an intelligent forms pro-cessing system (IFPS) which provides capabilities for au-tomatically indexing form documents for storage/retrieval to/from a document library and for capturing information from scanned form images using intelligent character ...
Resumen : Algunas aplicaciones del procesamiento del lenguaje natural, p.ej. la traducción automática, requieren una base de conocimiento provista de representaciones conceptuales que puedan reflejar la estructura del sistema cognitivo... more
Resumen : Algunas aplicaciones del procesamiento del lenguaje natural, p.ej. la traducción automática, requieren una base de conocimiento provista de representaciones conceptuales que puedan reflejar la estructura del sistema cognitivo del ser humano. En cambio, tareas como la indización automática o la extracción de información pueden ser realizadas con una semántica superficial. De todos modos, la construcción de una base