Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2457317.2457328acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Structure inference for linked data sources using clustering

Published: 18 March 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Linked Data (LD) is supplementing the World Wide Web of documents with a Web of data. This is becoming apparent from the number of LD repositories available as part of the Linked Open Data (LOD) cloud. At the instance-level, LD sources use a combination of terms from various vocabularies, expressed as RDFS/OWL, to describe their data and publish them to the Web. However, LD sources do not organise their data under a specific structure analogous to a relational schema; instead data can adhere to multiple vocabularies. Expressing SPARQL queries over LD sources -- usually over a SPARQL endpoint that is presented to the user -- requires a knowledge of the predicates used, to allow queries to express user requirements as graph patterns. Although LD provides low barriers to data publication using a homogeneous language (i.e., RDF), sources organise their data with different structures and terminologies. We would like to have a synopsis of how such data are organised in LD sources to inform the expressing of queries over such sources. With this paper we make the case that structural summaries over LD sources can inform query formulation and provide support for data integration and query processing over multiple LD sources. To fulfil this aim we propose an approach, that builds on a hierarchical clustering algorithm, for inferring structural summaries over LD sources. We have conducted an experimental evaluation using various LD sources to ascertain the extent to which our technique can successfully infer structural summaries from LD sources.

    References

    [1]
    M. Arenas, C. Gutierrez, and J. Pérez. Foundations of rdf databases. In Reasoning Web, pages 158--204, 2009.
    [2]
    C. Bizer and R. Cyganiak. D2r server, publishing relational databases on the semantic web. Poster at the 5th International Semantic Web Conference (ISWC), 2006.
    [3]
    C. Bizer, T. Heath, and Tl Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1--22, 2009.
    [4]
    M. J. Franklin, A. Y. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. SIGMOD Record, 34(4):27--33, 2005.
    [5]
    R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB '97, pages 436--445, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.
    [6]
    M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2):107--145, 2001.
    [7]
    A. Harth, K. Hose, M. Karnstedt, A. Polleres, K. Sattler, and J. Umbrich. Data summaries for on-demand queries over linked data. In WWW, pages 411--420, 2010.
    [8]
    T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, 2011.
    [9]
    A. Hogan, A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. Searching and browsing linked data with swse: The semantic web search engine. J. Web Sem., 9(4):365--401, 2011.
    [10]
    L. Kaufman and Peter J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 9th edition, 1990.
    [11]
    G. Klyne and J. J. Carroll. Resource Description Framework (RDF): Concepts and Abstract Syntax. Technical report, W3C, 2004.
    [12]
    M. Konrath, T. Gottron, S. Staab, and A. Scherp. Schemex - efficient construction of a data catalogue by stream-based indexing of linked data. Web Semantics: Science, Services and Agents on the World Wide Web, 16(5), 2012.
    [13]
    B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In KDD, pages 16--22, 1999.
    [14]
    N. W. Paton, K. Christodoulou, A. A. A. Fernandes, B. Parsia, and C. Hedeler. Pay-as-you-go data integration for linked data: opportunities, challenges and architectures. In SWIM, page 3, 2012.
    [15]
    F. Prasser, A. Kemper, and K. A. Kuhn. Efficient distributed query processing for autonomous rdf databases. In EDBT, pages 372--383, 2012.
    [16]
    E. Prud'hommeaux and A. Seaborne. SPARQL query language for rdf. W3C Recommendation, 4:1--106, 2008.
    [17]
    B. Quilitz and U. Leser. Querying distributed rdf data sources with sparql. In ESWC, pages 524--538, 2008.
    [18]
    A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (1), pages 601--616, 2011.
    [19]
    Jürgen Umbrich, K. Hose, M. Karnstedt, A. Harth, and A. Polleres. Comparing data summaries for processing live queries over linked data. WWW, 14(5-6):495--544, 2011.
    [20]
    J. Völker and M. Niepert. Statistical schema induction. In ESWC (1), pages 124--138, 2011.
    [21]
    Y. Zhao and G. Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In CIKM, pages 515--524, 2002.
    [22]
    N. Zong, D. Im, S. Yang, H. Namgoong, and H. Kim. Dynamic generation of concepts hierarchies for knowledge discovering in bio-medical linked data sets. In ICUIMC, page 12, 2012.

    Cited By

    View all
    • (2021)A survey on semantic schema discoveryThe VLDB Journal10.1007/s00778-021-00717-x31:4(675-710)Online publication date: 27-Nov-2021
    • (2020)Materialization of OWL Ontologies from Relational Databases: A Practical ApproachComputer Science – CACIC 201910.1007/978-3-030-48325-8_19(285-301)Online publication date: 14-May-2020
    • (2018)Scaling Up Schema Discovery for RDF Datasets2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2018.00021(84-89)Online publication date: Apr-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
    March 2013
    423 pages
    ISBN:9781450315999
    DOI:10.1145/2457317
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 March 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    EDBT/ICDT '13

    Acceptance Rates

    EDBT '13 Paper Acceptance Rate 7 of 10 submissions, 70%;
    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)A survey on semantic schema discoveryThe VLDB Journal10.1007/s00778-021-00717-x31:4(675-710)Online publication date: 27-Nov-2021
    • (2020)Materialization of OWL Ontologies from Relational Databases: A Practical ApproachComputer Science – CACIC 201910.1007/978-3-030-48325-8_19(285-301)Online publication date: 14-May-2020
    • (2018)Scaling Up Schema Discovery for RDF Datasets2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2018.00021(84-89)Online publication date: Apr-2018
    • (2018)Clustering of Propositions Equipped with UncertaintyInformation Processing and Management of Uncertainty in Knowledge-Based Systems. Applications10.1007/978-3-319-91479-4_59(715-726)Online publication date: 18-May-2018
    • (2016)HIEDSProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061137(3705-3711)Online publication date: 9-Jul-2016
    • (2015)Combining Syntactic and Semantic Evidence for Improving Matching over Linked Data SourcesProceedings, Part I, of the 16th International Conference on Web Information Systems Engineering --- WISE 2015 - Volume 941810.1007/978-3-319-26190-4_14(200-215)Online publication date: 1-Nov-2015
    • (2015)Discovering Types in RDF DatasetsThe Semantic Web: ESWC 2015 Satellite Events10.1007/978-3-319-25639-9_15(77-81)Online publication date: 31-May-2015
    • (2015)Schema Discovery in RDF Data SourcesConceptual Modeling10.1007/978-3-319-25264-3_36(481-495)Online publication date: 8-Dec-2015
    • (2014)Modelling of Experienced-Based Data in Linked Data EnvironmentProceedings of the 2014 International Conference on Intelligent Networking and Collaborative Systems10.1109/INCoS.2014.122(731-736)Online publication date: 10-Sep-2014
    • (2014)Learning Categories from Linked Open DataInformation Processing and Management of Uncertainty in Knowledge-Based Systems10.1007/978-3-319-08852-5_41(396-405)Online publication date: 2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media