Linking the Open
         Petko Valtchev
   (Assoc. Prof., Dept. of CS, UQAM)
          Montreal, April 6th


Why Link The Data
                                                     “I want you to put your data on the Web.”
                                                                      Sir T. Berners-Lee (TED’07)

•Original Web (1990s):
   • network of linked documents
•Web of Data (2000s):
   • network of interlinked data items
•Linked Open Data: Publish data on the Web:
   • max. reuse and inter-connections, min. redundancy, network effect
                     Data is really useful, whenever it is shared and combined with other data.


Linking Data?
•   But how should one produce such data?
    1. Global identification: a URL should point to any data item.

    2. Reachability via HTTP: accessing the URL should retrieve the data

    3. Linked structure: outgoing links (typed!) in the data should point to
       additional data with URLs.


•   THE language : Resource Description Framework (RDF)
    1. benefits: links provide context


A Graph?
           pd:tedstr              foaf:Person
                              Ted Strauss
                              Ted Strauss



A Graph?
           pd:tedstr              foaf:Person

                                        Ted Strauss
                                        Ted Strauss
                        dbpedia-owl:country     population




A Graph? Global?
                                             pd:tedstr              foaf:Person
               rdf:type                                                   Ted Strauss
                                                                          Ted Strauss
 pd:linguo                   foaf:Person
                              foaf:Person    foaf:based_near
                     foaf:name                                   dbpedia:Montreal
                               Linkun Guo
                                Linkun Guo
 foaf:based_near                                          dbpedia-owl:country     population


             dpprop:population                                        dbpedia:Canada


A Graph? Global? Giant?
                                                pd:tedstr                               foaf:Person
               rdf:type                                                                      Ted Strauss
                                                                                             Ted Strauss
 pd:linguo                   foaf:Person
                              foaf:Person      foaf:based_near
                     foaf:name                                              dbpedia:Montreal
                               Linkun Guo
                                Linkun Guo
 foaf:based_near                                                     dbpedia-owl:country     population


             dpprop:population                                                           dbpedia:Canada
                             20,693,000       dbpedia:Quebec
                                               dbpedia:Quebec     dbpedia-owl:country


How is it Open ?
•   ‘‘If you want to start interlinking data then you can only do that if the data is licensed
    in a way that allows such interlinking.’’
                                                                               Rufus Pollock

•   But why is Open data on the Web not ‘linked’?
    •   CVS, XML, RDBs
        •   no easy integration

    •   Web 2.0 Mashups?
        •   data sources fixed

•   Linked Open Data (LOD) cloud - global data space


The LOD cloud family picture
Sept. 2011


What for?
•   Linking Open Drug Data (LODD), since 2008
    •   Publish/interlink publicly available data about drugs

    •   Provide answers to non trivial questions on the LODD

        •   For physicians
            •   Which are the equivalent drugs for a given condition?

            •   What drugs are currently under clinical trial?

        •   For patients
            •   What alternatives exist to a given drug?

            •   What are the contraindications for a drug?


Supplemental Slides
          Petko Valtchev
   (Assoc. Prof., Dept. of CS, UQAM)

          Montreal, April 6th


Main Entry Points into the LOD cloud
•   DBPedia - a large multi-domain dataset containing extracted data from
    Wikipedia; it contains about 3.77M concepts, 400+M facts with abstracts in 11
    different languages.

•   YAGO - precise knowledge base with 1.7M entities and 15M facts derived
    from Wikipedia and WordNet.

•   FOAF (Friend Of A Friend) - describes people, the links between them and
    the things they create and do.

•   GoodRelations - a vocabulary for eCommerce, enabling web sites to publish
    details of their products and services in a machine-readable way.

•   GeoNames - provides RDF descriptions of more than 6.5M geographical
    features worldwide.


Cross-Media Cultural Heritage Management with LOD
•   Simon is a Maths student visiting Montreal. He is fond of reading, cinema, music and history. His friends
    recommended him the flourishing Mile End district where many cafĂŠs serve espresso and european pastry.
•   Once settled down in a bar, he opens his iPad to look what is exciting about the surroundings. Knowing his
    preferences, the mobile app suggests him an excerpt from a novel written by the local "infant du quarter",
    Mordecai Richler, called "The Apprenticeship of Duddy Kravitz". The excerpt describes the life of the Jewish
    community on two of the area's principal streets, St Urban St., and "The Main" St. in the 1930s.
•   Once finished, Simon feels intrigued and accepts the suggestion to go for a short walk looking for remains
    from that period. While sipping his coffee, Simon checks the author's biography and finds he has written
    another book, "Barney's Version".
•   After screening a summary, it is suggested to look at the eponimous film directed by Richard J. Lewis. While
    watching a trailer, he noticed the youthful red-haired actress playing the 1st wife of the main character and
    after querying the app’s knowledge base he learns that's Rachelle Lefevre who's born in Montreal.
•   Before walking out, he checks the availability of a copy of "Barney's Version" and discovers that he can find
    one in the local municipal library.
•   When on the go, the system plays "I'm your man" a song by Leonard Cohen, another literary celebrity from


The Semantic Annotations : RDFa
•   RDFa serializes RDF through HTML attributes

     •   similar to microformats

     •   @resource, @property, @href, @instanceof, @rel, etc.


Cool applications of semantic annotations

    •   Semantic query answering:
        •   Where do my colleagues live?
            •   Possible answers from their own web pages (via Trudat HP)

                •   dbpedia:Montreal

                •   dbpedia:Laval

                •   dbpedia:Toronto

        •   What are their dietary restrictions?


Practical take on OD vs LOD
•   OD for social justice in US (say Atlanta)?
    •   Dataset 1: census data
        •   Focus on particular area with houses distinguished
            •   inhabited by black people vs white people

    •   Dataset 2: water supply data, houses connected to water lines or not
•   By superposing datasets 1 and 2, analysis uncovered a discrimination
    •   ~83 % of the unconnected houses were inhabited by black people!!!

•   How was it done (a guess)
    •   matching between addresses as strings compared :-(

•   LOD format - simpler and more reliable processing:
    •   finding paths in the graph


Data about the Data
•   Reasoning about the dataset:
    •   Metadata:
        •   e.g. Dublin core vocabulary

•   Notion of provenance
    •   The problem of trust: everybody could publish everything

