Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2928294.2928301acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

An unsupervised classification process for large datasets using web reasoning

Published: 26 June 2016 Publication History

Abstract

Determining valuable data among large volumes of data is one of the main challenges in Big Data. We aim to extract knowledge from these sources using a Hierarchical Multi-Label Classification process called Semantic HMC. This process automatically learns a label hierarchy and classifies items from very large data sources. Five steps compose the Semantic HMC process: Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically the label hierarchy from data sources. The last two steps classify new items according to the label hierarchy. This paper focuses in the last two steps and presents a new highly scalable process to classify items from huge sets of unstructured text by using ontologies and rule-based reasoning. The process is implemented in a scalable and distributed platform to process Big Data and some results are discussed.

References

[1]
Ben-David, D. et al. 2010. Enterprise Data Classification Using Semantic Web Technologies. 9th International Semantic Web Conference - Volume Part II (2010), 66--81.
[2]
Bi, W. and Kwok, J. 2011. Multi-label classification on tree- and DAG-structured hierarchies. Yeast. (2011), 1--8.
[3]
Chen, M. et al. 2014. Big Data: A Survey. Mobile Networks and Applications. 19, 2 (Jan. 2014), 171--209.
[4]
Dean, J. and Ghemawat, S. 2008. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM. 51, 1 (2008), 1--13.
[5]
Elberrichi, Z. et al. 2012. Medical Documents Classification Based on the Domain Ontology MeSH. arXiv preprint arXiv:1207.0446. (2012).
[6]
Fang, J. et al. 2010. Documents classification by using ontology reasoning and similarity measure. Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on (2010), 1535--1539.
[7]
De Farias, T. et al. 2015. FOWLA, A Federated Architecture for Ontologies, RuleML 2015, LNCS 9202 (2015), 97--111.
[8]
Fensel, D. 2001. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag.
[9]
Galinina, A. and Borisov, A. 2013. Knowledge modelling for ontology-based multiattribute classification system. Applied Information and Communication (2013), 103--109.
[10]
Hassan, T. et al. 2014. Semantic HMC for big data analysis. Big Data (Big Data), 2014 IEEE International Conference on (2014), 26--28.
[11]
Hitzler, P. and Janowicz, K. 2013. Linked data, big data, and the 4th paradigm. Semantic Web. 4, (2013), 233--235.
[12]
Horst, H. J. 2005. Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. Web Semantics: Science, Services and Agents on the World Wide Web. (2005), 79--115.
[13]
Johnson, I. et al. 2010. Making ontology-based knowledge and decision trees interact: an approach to enrich knowledge and increase expert confidence in data-driven models. Knowledge Science, Engineering and Management. 304--316.
[14]
Kalyanpur, A. et al. 2007. Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Web.
[15]
Möller, R. and Haarslev, V. 2009. Tableau-based Reasoning.
[16]
Motik, B. 2009. Hypertableau Reasoning for Description Logics. 36, (2009), 165--228.
[17]
Motik, B. and Sattler, U. 2006. A Comparison of Reasoning Techniques for Querying Large Description Logic ABoxes. (2006), 227--241.
[18]
Obrst, L. 2003. Ontologies for semantically interoperable systems. Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03 (2003), 366--369.
[19]
Peixoto, R. et al. 2015. Semantic HMC: A Predictive Model using Multi-Label Classification For Big Data. The 9th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE-15) (2015).
[20]
Syed, A. et al. 2013. The Future Revolution on Big Data. Future. 2, 6 (2013), 2446--2451.
[21]
Tsarkov, D. and Horrocks, I. 2006. FaCT++ Description Logic Reasoner: System Description. Proceedings of the Third International Joint Conference (2006), 292--297.
[22]
Urbani, J. et al. 2010. OWL reasoning with WebPIE: calculating the closure of 100 billion triples. The Semantic Web: Research and Applications. Springer. 213--227.
[23]
Urbani, J. et al. 2011. QueryPIE: Backward Reasoning for OWL Horst over Very Large Knowledge Bases. Proceedings of the 10th International Conference on The Semantic Web - Volume Part I (Berlin, Heidelberg, 2011), 730--745.
[24]
Urbani, J. 2013. Three Laws Learned from Web-scale Reasoning. 2013 AAAI Fall Symposium Series (2013).
[25]
Urbani, J. et al. 2012. WebPIE: A Web-scale parallel inference engine using MapReduce. Web Semantics: Science, Services and Agents on the World Wide Web. (2012), 59--75.
[26]
Vogrincic, S. and Bosnic, Z. 2011. Ontology-based multi-label classification of economic articles. Comput. Sci. Inf. Syst. 8, 1 (2011), 101--119.
[27]
Werner, D. et al. 2014. Using DL-Reasoner for Hierarchical Multilabel Classification applied to Economical e-News. Science and Information Conference (2014), 8.
[28]
Wu, H. et al. 2013. A distributed rule execution mechanism based on MapReduce in sematic web reasoning. 5th Asia-Pacific Symposium on Internetware. (2013), 1--7.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SBD '16: Proceedings of the International Workshop on Semantic Big Data
June 2016
83 pages
ISBN:9781450342995
DOI:10.1145/2928294
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • GRAPHIQ: Graphiq Inc.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big-data
  2. classification
  3. machine learning
  4. ontology

Qualifiers

  • Research-article

Funding Sources

  • French agency ANRT
  • Actualis SARL
  • Portuguese COMPETE Program

Conference

SIGMOD/PODS'16
Sponsor:
  • GRAPHIQ
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 30 of 54 submissions, 56%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 202
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media