demonstration

Free access

SmartPub: A Platform for Long-Tail Entity Extraction from Scientific Publications

Authors:

Sepideh Mesbah,

Alessandro Bozzon,

Christoph Lofi,

Geert-Jan HoubenAuthors Info & Claims

WWW '18: Companion Proceedings of the The Web Conference 2018

Pages 191 - 194

https://doi.org/10.1145/3184558.3186976

Published: 23 April 2018 Publication History

All formats PDF

Abstract

This demo presents SmartPub, a novel web-based platform that supports the exploration and visualization of shallow meta-data (e.g., author list, keywords) and deep meta-data--long tail named entities which are rare, and often relevant only in specific knowledge domain--from scientific publications. The platform collects documents from different sources (e.g. DBLP and Arxiv), and extracts the domain-specific named entities from the text of the publications using Named Entity Recognizers (NERs) which we can train with minimal human supervision even for rare entity types. The platform further enables the interaction with the Crowd for filtering purposes or training data generation, and provides extended visualization and exploration capabilities. SmartPub will be demonstrated using sample collection of scientific publications focusing on the computer science domain and will address the entity types Dataset (i.e. dataset presented or used in a publication), and Methods (i.e. algorithms used to create/enrich/analyse a data set)

References

[1]

A. Bozzon, P. Fraternali, L. Galli, and R. Karam. Modeling crowdsourcing scenarios in socially-enabled human computation applications. Journal on Data Semantics, 3(3):169--188, Sep 2014.

Crossref

Google Scholar

[2]

Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014.

Digital Library

Google Scholar

[3]

P. Lopez. GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In European Conference on Digital Library (ECDL), Corfu, Greece, 2009.

Digital Library

Google Scholar

[4]

S. Mesbah, A. Bozzon, C. Lofi, and G.-J. Houben. Describing data processing pipelines in scientific publications for big data injection. In Proceedings of the 1st Workshop on Scholarly Web Mining, pages 1--8. ACM, 2017.

Digital Library

Google Scholar

[5]

S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, and G.-J. Houben. Facet embeddings for explorative analytics in digital libraries. In International Conference on Theory and Practice of Digital Libraries, pages 86--99. Springer, 2017.

Crossref

Google Scholar

[6]

S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, and G.-J. Houben. Semantic annotation of data processing pipelines in scientific publications. In European Semantic Web Conference, pages 321--336. Springer, 2017.

Digital Library

Google Scholar

[7]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.

Digital Library

Google Scholar

[8]

R. Reinanda, E. Meij, and M. de Rijke. Document filtering for long-tail entities. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 771--780. ACM, 2016.

Digital Library

Google Scholar

[9]

C.-T. Tsai, G. Kundu, and D. Roth. Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 1733--1738. ACM, 2013.

Digital Library

Google Scholar

Cited By

View all

Zhang MYang RXu J(2023)Construction and Analysis of Scientific Research Knowledge Graph in the Field of Hydrogen Energy TechnologyArtificial Intelligence in China10.1007/978-981-99-1256-8_46(392-402)Online publication date: 2-Apr-2023
https://doi.org/10.1007/978-981-99-1256-8_46
Lv ZYi KZhou WFei MShen Y(2022)A review of the knowledge extraction technology in knowledge graph2022 41st Chinese Control Conference (CCC)10.23919/CCC55666.2022.9901677(4211-4218)Online publication date: 25-Jul-2022
https://doi.org/10.23919/CCC55666.2022.9901677
Mesbah SLofi CTorre MBozzon AHouben G(2018)TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific PublicationsThe Semantic Web – ISWC 201810.1007/978-3-030-00671-6_8(127-143)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.1007/978-3-030-00671-6_8

Index Terms

SmartPub: A Platform for Long-Tail Entity Extraction from Scientific Publications
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
      2. Document structure
  2. Information systems applications
    1. Digital libraries and archives

Recommendations

Automatic gazette creation for named entity recognition and application to resume processing
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies

Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
NCBI disease corpus

Graphical abstractDisplay Omitted NCBI disease corpus is built as a gold-standard resource for disease recognition.793 PubMed abstracts are annotated with disease mentions and concepts (MeSH/OMIM).14 Annotators produced high consistency level and inter-...

Comments

Information & Contributors

Information

Published In

WWW '18: Companion Proceedings of the The Web Conference 2018

April 2018

2023 pages

ISBN:9781450356404

General Chairs:
Pierre-Antoine Champin
Université Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, CNRS, LIRIS, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Demonstration

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
599
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)34

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang MYang RXu J(2023)Construction and Analysis of Scientific Research Knowledge Graph in the Field of Hydrogen Energy TechnologyArtificial Intelligence in China10.1007/978-981-99-1256-8_46(392-402)Online publication date: 2-Apr-2023
https://doi.org/10.1007/978-981-99-1256-8_46
Lv ZYi KZhou WFei MShen Y(2022)A review of the knowledge extraction technology in knowledge graph2022 41st Chinese Control Conference (CCC)10.23919/CCC55666.2022.9901677(4211-4218)Online publication date: 25-Jul-2022
https://doi.org/10.23919/CCC55666.2022.9901677
Mesbah SLofi CTorre MBozzon AHouben G(2018)TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific PublicationsThe Semantic Web – ISWC 201810.1007/978-3-030-00671-6_8(127-143)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.1007/978-3-030-00671-6_8

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Automatic gazette creation for named entity recognition and application to resume processing

Two-stage approach to named entity recognition using Wikipedia and DBpedia

NCBI disease corpus