Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3184558.3186976acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
demonstration
Free access

SmartPub: A Platform for Long-Tail Entity Extraction from Scientific Publications

Published: 23 April 2018 Publication History

Abstract

This demo presents SmartPub, a novel web-based platform that supports the exploration and visualization of shallow meta-data (e.g., author list, keywords) and deep meta-data--long tail named entities which are rare, and often relevant only in specific knowledge domain--from scientific publications. The platform collects documents from different sources (e.g. DBLP and Arxiv), and extracts the domain-specific named entities from the text of the publications using Named Entity Recognizers (NERs) which we can train with minimal human supervision even for rare entity types. The platform further enables the interaction with the Crowd for filtering purposes or training data generation, and provides extended visualization and exploration capabilities. SmartPub will be demonstrated using sample collection of scientific publications focusing on the computer science domain and will address the entity types Dataset (i.e. dataset presented or used in a publication), and Methods (i.e. algorithms used to create/enrich/analyse a data set)

References

[1]
A. Bozzon, P. Fraternali, L. Galli, and R. Karam. Modeling crowdsourcing scenarios in socially-enabled human computation applications. Journal on Data Semantics, 3(3):169--188, Sep 2014.
[2]
Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014.
[3]
P. Lopez. GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In European Conference on Digital Library (ECDL), Corfu, Greece, 2009.
[4]
S. Mesbah, A. Bozzon, C. Lofi, and G.-J. Houben. Describing data processing pipelines in scientific publications for big data injection. In Proceedings of the 1st Workshop on Scholarly Web Mining, pages 1--8. ACM, 2017.
[5]
S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, and G.-J. Houben. Facet embeddings for explorative analytics in digital libraries. In International Conference on Theory and Practice of Digital Libraries, pages 86--99. Springer, 2017.
[6]
S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, and G.-J. Houben. Semantic annotation of data processing pipelines in scientific publications. In European Semantic Web Conference, pages 321--336. Springer, 2017.
[7]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.
[8]
R. Reinanda, E. Meij, and M. de Rijke. Document filtering for long-tail entities. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 771--780. ACM, 2016.
[9]
C.-T. Tsai, G. Kundu, and D. Roth. Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 1733--1738. ACM, 2013.

Cited By

View all
  • (2023)Construction and Analysis of Scientific Research Knowledge Graph in the Field of Hydrogen Energy TechnologyArtificial Intelligence in China10.1007/978-981-99-1256-8_46(392-402)Online publication date: 2-Apr-2023
  • (2022)A review of the knowledge extraction technology in knowledge graph2022 41st Chinese Control Conference (CCC)10.23919/CCC55666.2022.9901677(4211-4218)Online publication date: 25-Jul-2022
  • (2018)TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific PublicationsThe Semantic Web – ISWC 201810.1007/978-3-030-00671-6_8(127-143)Online publication date: 8-Oct-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information extraction. document metadata
  2. long-tail entity types
  3. named entity recognition
  4. training data generation

Qualifiers

  • Demonstration

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)131
  • Downloads (Last 6 weeks)34
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Construction and Analysis of Scientific Research Knowledge Graph in the Field of Hydrogen Energy TechnologyArtificial Intelligence in China10.1007/978-981-99-1256-8_46(392-402)Online publication date: 2-Apr-2023
  • (2022)A review of the knowledge extraction technology in knowledge graph2022 41st Chinese Control Conference (CCC)10.23919/CCC55666.2022.9901677(4211-4218)Online publication date: 25-Jul-2022
  • (2018)TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific PublicationsThe Semantic Web – ISWC 201810.1007/978-3-030-00671-6_8(127-143)Online publication date: 8-Oct-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media