Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3103010.3121040acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Distributing Text Mining tasks with librAIry

Published: 31 August 2017 Publication History

Abstract

We present librAIry, a novel architecture to store, process and analyze large collections of textual resources, integrating existing algorithms and tools into a common, distributed, high-performance workflow. Available text mining techniques can be incorporated as independent plug&play modules working in a collaborative manner into the framework. In the absence of a pre-defined flow, librAIry leverages on the aggregation of operations executed by different components in response to an emergent chain of events. Extensive use of Linked Data (LD) and Representational State Transfer (REST) principles are made to provide individually addressable resources from textual documents. We have described the architecture design and its implementation and tested its effectiveness in real-world scenarios such as collections of research papers, patents or ICT aids, with the objective of providing solutions for decision makers and experts in those domains. Major advantages of the framework and lessons-learned from these experiments are reported.

Reference

[1]
Christian Bizer, T Heath, and T Berners-Lee. 2009. Linked data-the story so far. International journal on Semantic Web and Information Systems, Vol. 5, 3 (2009), 1--22. y

Cited By

View all
  • (2023)Automatic Topic Label Generation using Conversational ModelsProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627574(17-24)Online publication date: 5-Dec-2023
  • (2023)Lessons learned to enable question answering on knowledge graphs extracted from scientific publications: A case study on the coronavirus literatureJournal of Biomedical Informatics10.1016/j.jbi.2023.104382142(104382)Online publication date: Jun-2023
  • (2023)Towards an Automatic Easy-to-Read Adaptation of Morphological Features in Spanish TextsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42280-5_12(176-198)Online publication date: 25-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering
August 2017
242 pages
ISBN:9781450346894
DOI:10.1145/3103010
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • SIGDOC: ACM Special Interest Group on Systems Documentation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data integration
  2. large-scale text analysis
  3. nlp
  4. scholarly data
  5. text mining

Qualifiers

  • Short-paper

Funding Sources

  • supported by project Datos 4.0 with reference TIN2016-78011-C4-4-R, financed by the Spanish Ministry MINECO and co-financed by FEDER

Conference

DocEng '17
Sponsor:
DocEng '17: ACM Symposium on Document Engineering 2017
September 4 - 7, 2017
Valletta, Malta

Acceptance Rates

DocEng '17 Paper Acceptance Rate 13 of 71 submissions, 18%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Automatic Topic Label Generation using Conversational ModelsProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627574(17-24)Online publication date: 5-Dec-2023
  • (2023)Lessons learned to enable question answering on knowledge graphs extracted from scientific publications: A case study on the coronavirus literatureJournal of Biomedical Informatics10.1016/j.jbi.2023.104382142(104382)Online publication date: Jun-2023
  • (2023)Towards an Automatic Easy-to-Read Adaptation of Morphological Features in Spanish TextsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42280-5_12(176-198)Online publication date: 25-Aug-2023
  • (2022)Validation of scientific topic models using graph analysis and corpus metadataScientometrics10.1007/s11192-022-04318-5127:9(5441-5458)Online publication date: 30-Mar-2022
  • (2020)Large-scale semantic exploration of scientific literature using topic-based hashing algorithmsSemantic Web10.3233/SW-20037311:5(735-750)Online publication date: 1-Jan-2020
  • (2019)Scalable Cross-lingual Document Similarity through Language-specific Concept HierarchiesProceedings of the 10th International Conference on Knowledge Capture10.1145/3360901.3364444(147-153)Online publication date: 23-Sep-2019
  • (2017)Efficient Clustering from Distributions over TopicsProceedings of the 9th Knowledge Capture Conference10.1145/3148011.3148019(1-8)Online publication date: 4-Dec-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media