Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2479787.2479867acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Guidelines for multilingual linked data

Published: 12 June 2013 Publication History
  • Get Citation Alerts
  • Abstract

    In this article, we argue that there is a growing number of linked datasets in different natural languages, and that there is a need for guidelines and mechanisms to ensure the quality and organic growth of this emerging multilingual data network. However, we have little knowledge regarding the actual state of this data network, its current practices, and the open challenges that it poses. Questions regarding the distribution of natural languages, the links that are established across data in different languages, or how linguistic features are represented, remain mostly unanswered. Addressing these and other language-related issues can help to identify existing problems, propose new mechanisms and guidelines or adapt the ones in use for publishing linked data including language-related features, and, ultimately, provide metrics to evaluate quality aspects. In this article we review, discuss, and extend current guidelines for publishing linked data by focusing on those methods, techniques and tools that can help RDF publishers to cope with language barriers. Whenever possible, we will illustrate and discuss each of these guidelines, methods, and tools on the basis of practical examples that we have encountered in the publication of the datos.bne.es dataset.

    References

    [1]
    K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. Describing Linked Datasets with the VoID Vocabulary. W3C interest group note, W3C, 2011. http://www.w3.org/TR/void/.
    [2]
    Auer, S., Lehmann, J., & Hellmann, S. (2009). Linkedgeodata: Adding a spatial dimension to the web of data. In The Semantic Web-ISWC 2009 (pp. 731--746). Springer Berlin Heidelberg.
    [3]
    Auer, S., Weidl, M., Lehmann, J., Zaveri, A. J., and Choi, K. S. (2010). I18n of semantic web applications. In The Semantic Web--ISWC 2010 (pp. 1--16). Springer Berlin Heidelberg.
    [4]
    Auer, S., Bühmann, L., Dirschl, C., Erling, O., Hausenblas, M., Isele, R., and Williams, H. (2012). Managing the life-cycle of Linked Data with the LOD2 Stack. In The Semantic Web--ISWC 2012 (pp. 1--16). Springer Berlin Heidelberg.
    [5]
    Design issues: Linked Data. Online resource available at http://www.w3.org/DesignIssues/LinkedData (last viewed April 2013)
    [6]
    Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3), 1--22.
    [7]
    P. Buitelaar, K. S. Choi, P. Cimiano, E. Hovy. Report on the Dagstuhl Seminar: "The Multilingual Semantic Web", November 22, 2012 (to be published).
    [8]
    Cimiano, Philip, Paul Buitelaar, John McCrae, and Michael Sintek. (2010). LexInfo: A Declarative Model for the Lexicon-Ontology Interface. Journal of Web
    [9]
    Cimiano, P., Montiel-Ponsoda, E., Buitelaar, P., Espinoza, M., & Góómez-Pérez, A. (2010). A note on ontology localization. Applied Ontology, 5(2), 127--137.
    [10]
    Cimiano, P., Buitelaar, P., McCrae, J., & Sintek, M. (2011). LexInfo: A declarative model for the lexicon-ontology interface. Web Semantics: Science, Services and Agents on the World Wide Web, 9(1), 29--51.
    [11]
    M. d'Aquin, C. Baldassarre, L. Gridinoc, S. Angeletou, M. Sabou, and E. Motta. Characterizing knowledge on the semantic web with watson. In R. Garcia-Castro, D. Vrandecic, A. Gmez-Prez, Y. Sure, and Z. Huang, editors, EON, volume 329 of CEUR Workshop Proceedings, pages 1{10. CEUR-WS.org, 2007.
    [12]
    Das, S., Sundara, S., and Cyganiak, R. (2012). R2RML: RDB to RDF Mapping Language. W3C Recommendation
    [13]
    L. Ding and T. Finin. Characterizing the semantic web on the web. In Proceedings of the 5th International Semantic Web Conference, 2006.
    [14]
    Dunning, T., Laboratory, N. M. S. U. C. R.: Statistical Identification of Language. Memoranda in computer and cognitive science. Computing Research Laboratory, New Mexico State University (1994)
    [15]
    Ell, B., Vrandecic, D., Simperl, E. P. B.: Labels in the web of data. In Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N. F., Blomqvist, E., eds.: International Semantic Web Conference (1). Volume 7031 of Lecture Notes in Computer Science., Springer (2011) 162--176
    [16]
    Espinoza, M., Gómez-Pérez, A., & Mena, E. (2008). Enriching an ontology with multilingual information. In The Semantic Web: Research and Applications(pp. 333--347). Springer Berlin Heidelberg.
    [17]
    Espinoza, M., Montiel-Ponsoda, E., & Góómez-Pérez, A. (2009, September). Ontology localization. In Proceedings of the fifth international conference on Knowledge capture (pp. 33--40). ACM.
    [18]
    Ferrara, A., Nikolov, A., & Scharffe, F. (2011). Data linking for the semantic web. International Journal on Semantic Web and Information Systems (IJSWIS), 7(3), 46--76.
    [19]
    Gottron, T., Lipka, N.: A comparison of language identification approaches on short, query-style texts. In Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S. M., van Rijsbergen, K., eds.: ECIR. Volume 5993 of Lecture Notes in Computer Science., Springer (2010) 611--614
    [20]
    J. Gracia, E. Montiel-Ponsoda, P. Cimiano, A. Gómez-Pérez, P. Buitelaar, J. McCrae. Challenges for the multilingual Web of Data. In Web Semantics: Science, Services and Agents on the World Wide Web 11, p 63--71, 2011.
    [21]
    Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1--136. Morgan & Claypool.
    [22]
    B. Hyland, B. Villazón-Terrazas, G. Atemezig. Best Practices for Publishing Linked Data. W3C Note 18 April 2013. Available at http://www.w3.org/TR/gld-bp/.
    [23]
    A. Isaac and B. Haslhofer. Europeana Linked Open Data -- data.europeana.eu. Semantic Web Journal, to appear. Available from http://www.semantic-web-journal.net/
    [24]
    Tobias Käfer, Jürgen Umbrich, Aidan Hogan and Axel Polleres, Towards a Dynamic Linked Data Observatory, in the Proceedings of the Linked Data on the Web WWW2012 Workshop (LDOW 2012), Lyon, France, 16 April, 2012.
    [25]
    Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M.,... & Lee, R. (2009). Media meets semantic web--how the bbc uses dbpedia and linked data to make connections. In The Semantic Web: Research and Applications (pp. 723--737). Springer Berlin Heidelberg.
    [26]
    Jose Emilio Labra Gayo, Dimitris Kontokostas, Soeren Auer, Multilingual Linked Open Data Patterns. Semantic Web journal {under review}, 2013. Available from http://www.semantic-web-journal.net/
    [27]
    Maali, F., Cyganiak, R., & Peristeras, V. (2012). A publishing pipeline for linked government data. In The Semantic Web: Research and Applications (pp. 778--792). Springer Berlin Heidelberg.
    [28]
    F. Maali, J. Erickson, P. Archer. Data Catalog Vocabulary (DCAT) W3C Working Draft 12 March 2013. Available at http://www.w3.org/TR/vocab-dcat/.
    [29]
    McCrae, John, Guadalupe Aguado de Cea, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Asunción Gómez- Pérez, Jorge Gracia, Laura Hollink, Elena Montiel-Ponsoda, Dennis Spohr, and Tobias Wunner. (2012). Interchanging Lexical Resources in the Semantic Web. Language Resources and Evaluation 46, (4),p. 701--719.
    [30]
    E. Montiel-Ponsoda, G. Aguado de Cea, A. Gómez-Pérez, and W. Peters, Enriching ontologies with multilingual information. In Journal of Natural Language Engineering 17 (3): 283--309. 2009
    [31]
    E. Montiel-Ponsoda, D. Vila-Suero, B. Villazón-Terrazas, G. Dunsire, E. Escolano Rodríguez, A. Gómez-Pérez. Style Guidelines for Naming and Labeling Ontologies in the Multilingual Web. In Proceedings of the 2011 International Conference on Dublin Core and Metadata Applications, DCMI '11. Dublin Core Metadata Initiative, 2011.
    [32]
    Montiel-Ponsoda, E., Gracia, J., Aguado de Cea, G., Gómez-Pérez, A. (2011). Representing Translations on the Semantic Web. En actas del workshop MSW 2011 -- Workshop on the Multilingual Semantic Web, CEUR-Proceedings Vol-775, pp. 25--37.
    [33]
    Vila-Suero, D., Villazón-Terrazas, B. and Gómez-Pérez, A. (2013), "datos.bne.es: A library linked dataset". Semantic Web Journal, to appear. Available from http://www.semantic-web-journal.net/.
    [34]
    Vilches-Blázquez, L. M., Villazón-Terrazas, B., Corcho, O., & Góómez-Pérez, A. (2013). Integrating geographical information in the Linked Digital Earth. International Journal of Digital Earth, (just-accepted).
    [35]
    Villazón-Terrazas, Boris, Mari Carmen Suárez-Figueroa, and Asunción Gómez-Pérez. "A pattern-based method for re-engineering non-ontological resources into ontologies." International Jounal on Semantic Web and Information Systems 6.4 (2010): 27--63.
    [36]
    B. Villazón-Terrazas, L. Vilches-Blázquez, O. Corcho, A. Gómez-Pérez. Methodological Guidelines for Publishing Government Linked Data. In Wood, D. (ed.): Linking Government Data. Springer New York, p. 27--49, 2011.
    [37]
    Villazón-Terrazas, B., Vila-Suero, D., Garijo, D., Vilches-Blazquez, L. M., Poveda-Villalon, M., Mora, J., & Gomez-Perez, A. Publishing Linked Data-There is no One-Size-Fits-All Formula. European Data Forum (2012)
    [38]
    Vojtek, P., Bieliková, M.: M.: Comparing natural language identification methods based on markov processes. In: In: Slovko, International Seminar on Computer Treatment of Slavic and East European Languages. (2007)

    Cited By

    View all

    Recommendations

    Reviews

    Jolanta Mizera-Pietraszko

    In multilingualism, the goal is always the same: to overcome language barriers. The research problem in this paper relates to methodological guidelines for publishing linked data based on the Spanish datos.bne.es dataset. The authors review existing technologies, tools, and resources with the goal of providing high-quality data to users working in multilingual environments. The distribution of natural languages across datasets in a resource description framework (RDF) indicates that less than a quarter of them operate in more than one language. The guidelines are analyzed for the specification of data sources, the modeling of vocabulary, the generation of a resource discovery tool (RDT), interlinking between the datasets, the publication of multilingual resources, and, eventually, the exploitation of RDF datasets. In the conclusion, the authors outline the main lessons learned. I would recommend this study specifically to users browsing the web for information in languages other than their mother tongue, as well as to the developers of web applications with links in more than one language. In my opinion, the approach seems interesting, although, if I were to pursue such research, I would extend it to more languages to make the conclusions more reliable and representative. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
    June 2013
    408 pages
    ISBN:9781450318501
    DOI:10.1145/2479787
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • UAM: Autonomous University of Madrid

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. linked data
    2. multilingual
    3. semantic web

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WIMS '13
    Sponsor:
    • UAM

    Acceptance Rates

    WIMS '13 Paper Acceptance Rate 28 of 72 submissions, 39%;
    Overall Acceptance Rate 140 of 278 submissions, 50%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Digital HumanitiesEncyclopedia of Information Science and Technology, Sixth Edition10.4018/978-1-6684-7366-5.ch052(1-19)Online publication date: 1-Jul-2024
    • (2018)Analysis of Editors' Languages in WikidataProceedings of the 14th International Symposium on Open Collaboration10.1145/3233391.3233965(1-5)Online publication date: 22-Aug-2018
    • (2018)Models to represent linguistic linked dataNatural Language Engineering10.1017/S135132491800034724:6(811-859)Online publication date: 4-Oct-2018
    • (2018)Domain adaptation for ontology localizationWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2015.12.00136:C(23-31)Online publication date: 20-Dec-2018
    • (2018)The Human Face of the Web of Data: A Cross-sectional Study of LabelsProcedia Computer Science10.1016/j.procs.2018.09.007137(66-77)Online publication date: 2018
    • (2017)Methodological Guidelines for Publishing Library Data as Linked Data2017 International Conference on Information Systems and Computer Science (INCISCOS)10.1109/INCISCOS.2017.17(241-246)Online publication date: Nov-2017
    • (2016)JRC-Names: Multilingual entity name variants and titles as Linked DataSemantic Web10.3233/SW-1602288:2(283-295)Online publication date: 6-Dec-2016
    • (2016)Translating Ontologies in Real-World SettingsThe Semantic Web – ISWC 201610.1007/978-3-319-46547-0_25(241-256)Online publication date: 23-Sep-2016
    • (2016)ESSOT: An Expert Supporting System for Ontology TranslationNatural Language Processing and Information Systems10.1007/978-3-319-41754-7_6(60-73)Online publication date: 17-Jun-2016
    • (2014)Publishing Linked Data on the Web: The Multilingual DimensionTowards the Multilingual Semantic Web10.1007/978-3-662-43585-4_7(101-117)Online publication date: 19-Aug-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media