Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Link key candidate extraction with relational concept analysis

Published: 15 February 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Linked data aims at publishing data expressed in RDF (Resource Description Framework) at the scale of the worldwide web. These datasets interoperate by publishing links which identify individuals across heterogeneous datasets. Such links may be found by using a generalisation of keys in databases, called link keys, which apply across datasets. They specify the pairs of properties to compare for linking individuals belonging to different classes of the datasets. Here, we show how to recast the proposed link key extraction techniques for RDF datasets in the framework of formal concept analysis. We define a formal context, where objects are pairs of resources and attributes are pairs of properties, and show that formal concepts correspond to link key candidates. We extend this characterisation to the full RDF model including non functional properties and interdependent link keys. We show how to use relational concept analysis for dealing with cyclic dependencies across classes and hence link keys. Finally, we discuss an implementation of this framework.

    References

    [1]
    Achichi M., Ellefi M.B., Symeonidou D., Todorov K., Automatic key selection for data linking, in: Proc. 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW), Bologna (IT), in: Lecture Notes in Computer Science, vol. 10024, Springer, 2016, pp. 3–18.
    [2]
    M. Al-Bakri, M. Atencia, J. David, S. Lalande, M.-C. Rousset, Uncertainty-sensitive reasoning for inferring sameAs facts in linked data, in: Proc. 22nd European Conference on Artificial Intelligence, ECAI, Der Haague, NL, 2016, pp. 698–706.
    [3]
    Al-Bakri M., Atencia M., Lalande S., Rousset M.-C., Inferring same-as facts from Linked Data: an iterative import-by-query approach, in: Proc. 29th AAAI Conference on Artificial Intelligence, Austin, TX, US, AAAI Press, 2015, pp. 9–15.
    [4]
    Atencia M., Chein M., Croitoru M., David J., Leclère M., Pernelle N., Saïs F., Scharffe F., Symeonidou D., Defining key semantics for the RDF datasets: experiments and evaluations, in: Proc. 21st International Conference on Conceptual Structures (ICCS), Iasi (RO), in: Lecture Notes in Computer Science, vol. 8577, Springer, 2014, pp. 65–78.
    [5]
    Atencia M., David J., Euzenat J., Data interlinking through robust linkkey extraction, in: Proc. 21st European Conference on Artificial Intelligence, ECAI, IOS Press, 2014, pp. 15–20.
    [6]
    Atencia M., David J., Euzenat J., What can FCA do for database linkkey extraction?, in: Proc. 3rd ECAI workshop on What can FCA do for Artificial Intelligence? (FCA4AI), Praha (CZ), in: CEUR Workshop Proceedings, vol. 1257, CEUR-WS.org, 2014.
    [7]
    Atencia M., David J., Scharffe F., Keys and pseudo-keys detection for web datasets cleansing and interlinking, in: Proc. 18th international conference on knowledge engineering and knowledge management (EKAW), Galway (IE), in: Lecture Notes in Computer Science, vol. 7605, Springer, 2012, pp. 144–153.
    [8]
    Baader F., Calvanese D., McGuinness D., Nardi D., Patel-Schneider P. (Eds.), The description logic handbook: theory, implementations and applications, Cambridge University Press, 2003.
    [9]
    Baixeries J., Kaytoue M., Napoli A., Characterizing functional dependencies in formal concept analysis with pattern structures, Ann. Math. Artif. Intell. 72 (2) (2014) 129–149.
    [10]
    Bizer C., Heath T., Berners-Lee T., Linked data — the story so far, Intl. J. Semant. Web Inform. Syst. 5 (3) (2009) 1–22.
    [11]
    Braud A., Dolques X., Huchard M., Le Ber F., Generalization effect of quantifiers in a classification based on relational concept analysis, Knowl.-Based Syst. 160 (2018) 119–135.
    [12]
    Brickley D., Guha R., Rdf schema 1.1, in: Recommendation, W3C, 2014, https://www.w3.org/TR/rdf-schema/.
    [13]
    D. Calvanese, G. De Giacomo, M. Lenzerini, Keys for free in description logics, in: Proc. Description Logic Workshop, DL, Aachen, DE, 2000, pp. 79–88.
    [14]
    Christen P., Data Matching—Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer, Heidelberg, DE, 2012.
    [15]
    Codocedo V., Baixeries J., Kaytoue M., Napoli A., Characterization of order-like dependencies with formal concept analysis, in: Proc. 30th International Conference on Concept Lattices and Their Applications (CLA), Moscow (RU), in: CEUR Workshop Proceedings, vol. 1624, CEUR-WS.org, 2016, pp. 123–134.
    [16]
    Cyganiak R., Wood D., Lanthaler M., RDF 1.1 concepts and abstract syntax, in: Recommendation, W3C, 2014, http://www.w3.org/TR/rdf11-concepts/.
    [17]
    Demetrovics J., Libkin L., Muchnik I., Functional dependencies in relational databases: a lattice point of view, Discrete Appl. Math. 40 (2) (1992) 155–185.
    [18]
    Elmagarmid A., Ipeirotis P., Verykios V., Duplicate record detection: a survey, IEEE Trans. Knowl. Data Eng. 19 (1) (2007) 1–16.
    [19]
    Euzenat J., Shvaiko P., Ontology Matching, second ed., Springer, Heidelberg, DE, 2013.
    [20]
    H. Farah, D. Symeonidou, K. Todorov, KeyRanker: Automatic RDF key ranking for data linking, in: Proc. Knowledge Capture Conference, K-CAP, Austin, TX, US, 2017, pp. 7:1–7:8.
    [21]
    Ferrara A., Nikolov A., Scharffe F., Data linking for the semantic web, Intl. J. Semant. Web Inform. Syst. 7 (3) (2011) 46–76.
    [22]
    Ganter B., Kuznetsov S.O., Pattern structures and their projections, in: Delugach H.S., Stumme G. (Eds.), International Conference on Conceptual Structures, ICCS, in: Lecture Notes in Computer Science, vol. 2120, 2001, pp. 129–142.
    [23]
    Ganter B., Wille R., Formal Concept Analysis, Springer, Berlin, 1999.
    [24]
    Heath T., Bizer C., Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool, 2011.
    [25]
    Hogan A., Zimmermann A., Umbrich J., Polleres A., Decker S., Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora, J. Web Semant. 10 (2012) 76–110.
    [26]
    Huhtala Y., Kärkkäinen J., Porkka P., Toivonen H., TANE: an efficient algorithm for discovering functional and approximate dependencies, Comput. J. 42 (2) (1999) 100–111.
    [27]
    Isele R., Bizer C., Active learning of expressive linkage rules using genetic programming, J. Web Semant. 23 (2013) 2–15.
    [28]
    Kaytoue M., Kuznetsov S.O., Napoli A., Duplessis S., Mining gene expression data with pattern structures in formal concept analysis, Inform. Sci. 181 (10) (2011) 1989–2001.
    [29]
    Kuznetsov S., Obiedkov S., Comparing performance of algorithms for generating concept lattices, J. Exp. Theoret. Artif. Intell. 14 (2002) 189–216.
    [30]
    Levene M., A lattice view of functional dependencies in incomplete relations, Acta Cybernet. 12 (2) (1995) 181–207.
    [31]
    Lopes S., Petit J.-M., Lakhal L., Functional and approximate dependency mining: database and FCA points of view, J. Exp. Theoret. Artif. Intell. 14 (2–3) (2002) 93–114.
    [32]
    Lutz C., Areces C., Horrocks I., Sattler U., Keys, nominals, and concrete domains, J. Artificial Intelligence Res. 23 (2005) 667–726.
    [33]
    Motik B., Patel-Schneider P.F., Parsia B., OWL 2 web ontology language: Structural Specification and functional-style syntax (2nd edition), Recommendation, second ed., W3C, 2012, https://www.w3.org/TR/owl2-syntax/.
    [34]
    Nentwig M., Hartung M., Ngonga Ngomo A.-C., Rahm E., A survey of current link discovery frameworks, Semant. Web 8 (3) (2017) 419–436,.
    [35]
    A.-C. Ngonga Ngomo, S. Auer, LIMES: A time-efficient approach for large-scale link discovery on the web of data, in: Proc. 22nd International Joint Conference on Artificial Intelligence, IJCAI, Barcelona (ES), 2011, pp. 2312–2317.
    [36]
    Ngonga Ngomo A.-C., Lyko K., EAGLE: efficient active learning of link specifications using genetic programming, in: Proc. 9th ESWC, Heraklion, GR, in: Lecture Notes in Computer Science, vol. 7295, 2012, pp. 149–163.
    [37]
    Norris E., An algorithm for computing the maximal rectangles in a binary relation, Rev. Roumaine Math. Pures Appl. 23 (2) (1978) 243–250.
    [38]
    Pernelle N., Saïs F., Symeounidou D., An automatic key discovery approach for data linking, J. Web Semant. 23 (2013) 16–30.
    [39]
    Romashkin N., FCA Library, GitHub Repository (2011) https://github.com/ae-hse/fca.
    [40]
    Rouane-Hacene M., Huchard M., Napoli A., Valtchev P., Relational Concept Analysis: mining concept lattices from multi-relational data, Ann. Math. Artif. Intell. 67 (1) (2013) 81–108.
    [41]
    Rouane-Hacene M., Huchard M., Napoli A., Valtchev P., Soundness and completeness of relational concept analysis, in: Cellier P., Distel F., Ganter B. (Eds.), Proc. 11h International Conference on Formal Concept Analysis ICFCA, in: Lecture Notes in Artificial Intelligence, vol. 7880, Springer, 2013, pp. 228–243.
    [42]
    Saïs F., Pernelle N., Rousset M.-C., L2R: A Logical method for reference reconciliation, in: Proc. 22nd National Conference on Artificial Intelligence, AAAI, Vancouver, CA, AAAI Press, 2007, pp. 329–334.
    [43]
    M.A. Sherif, K. Dreßler, P. Smeros, A.-C. Ngonga Ngomo, Radon - rapid discovery of topological relations, in: Proc. 31st AAAI Conference on Artificial Intelligence, San Francisco, CA US, 2017, pp. 175–181.
    [44]
    Sherif M.A., Ngonga Ngomo A.-C., Lehmann J., Wombat - A generalization approach for automatic link discovery, in: Proc. 14th European Semantic Web Conference, ESWC, Portorož, SL, in: Lecture Notes in Computer Science, vol. 10249, Springer, 2017, pp. 103–119.
    [45]
    Y. Sismanis, P. Brown, P. Haas, B. Reinwald, GORDIAN: efficient and scalable discovery of composite keys, in: Proc. 32nd International Conference on Very Large Databases, VLDB, 2006, pp. 691–702.
    [46]
    Suchanek F., Abiteboul S., Senellart P., PARIS: probabilistic alignment of relations, instances, and schema, PVLDB 5 (3) (2012) 157–168.
    [47]
    Symeonidou D., Armant V., Pernelle N., Saïs F., SAKey: scalable almost key discovery in RDF data, in: Proc. 13th International Semantic Web Conference (ISWC), Riva del Garda (IT), in: Lecture Notes in Computer Science, vol. 8796, Springer, 2014, pp. 33–49.
    [48]
    Symeonidou D., Galárraga L., Pernelle N., Saïs F., Suchanek F.M., VICKEY: mining conditional keys on knowledge bases, in: Proc. 16th International Semantic Web Conference (ISWC), Wien (AT), in: Lecture Notes in Computer Science, vol. 10587, Springer, 2017, pp. 661–677.
    [49]
    Volz J., Bizer C., Gaedke M., Kobilarov G., Silk – A link discovery framework for the web of data, in: Proc. WWW Workshop on Linked Data on the Web, LDOW, Madrid (SP), in: CEUR Workshop Proceedings, vol. 538, CEUR-WS.org, 2009.

    Cited By

    View all

    Index Terms

    1. Link key candidate extraction with relational concept analysis
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Discrete Applied Mathematics
          Discrete Applied Mathematics  Volume 273, Issue C
          Feb 2020
          252 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 15 February 2020

          Author Tags

          1. Formal concept analysis
          2. Relational concept analysis
          3. Linked data
          4. Link key
          5. Data interlinking
          6. Resource description framework

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 11 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Efficient Key-based Data Linking Through Key Transfer Between Knowledge GraphsProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636041(1642-1649)Online publication date: 8-Apr-2024
          • (2024)RCAvizInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109123166:COnline publication date: 1-Mar-2024
          • (2023)Extracting Concepts From Fuzzy Relational Context FamiliesIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2022.319782631:4(1202-1213)Online publication date: 1-Apr-2023
          • (2023)Relational Concept Analysis in Practice: Capitalizing on Data Modeling Using Design PatternsFormal Concept Analysis10.1007/978-3-031-35949-1_12(166-182)Online publication date: 17-Jul-2023
          • (2021)On the relation between keys and link keys for data interlinkingSemantic Web10.3233/SW-20041412:4(547-567)Online publication date: 1-Jan-2021

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media