Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
SAnno: a unifying framework for semantic annotation

Davide Eynard
IDSIA, 01/06/2010
2

Introduction

  •   S(emantic)Anno(tations)
        • … in Italy, “sanno” also means “they know”


  •   Basic principle: anyone should be able to say anything about
      anything else
        • Well, this should hold in general :-)
        • Actually, in our case it is “anything about any URI”
        • And we would like everyone to say that in a formal way


  •   But first, a little step back in time...




        IDSIA, 01/06/2010   Davide Eynard
3

Participation and semantics




                                               Data




                          Structure


      IDSIA, 01/06/2010        Davide Eynard
4

Sanno's grandfather: Speakinabout [1]

  •   Purpose: produce semantic annotations about named entities
        • When you read “Harry Potter”, is it the book or the movie?


  •   Plays with user gratifications
        • When users annotate a string as matching a specific concept, they
          are shown a list of services/search engines which are related to it

  •   Relies on user provided data:
        • Freebase types
        • User generated search templates, built inside a wiki system




        IDSIA, 01/06/2010   Davide Eynard
5

Sanno's grandfather: Speakinabout [1]




      IDSIA, 01/06/2010   Davide Eynard
6

Sanno's grandfather: Speakinabout [1]




      IDSIA, 01/06/2010   Davide Eynard
7

Sanno's father: RDFMonkey [2]

  •   Purpose: augment browsing experience by providing
      information/services related to the visited URL

  •   Relies on Freebase types
       • … as in SpeakinAbout, but without requiring user interaction
       • Types are found by searching backlinks in Freebase (which topics
         are linking the visited page)

  •   Related services as widgets inside a browser extension
       • The app could load widgets at runtime (from Freebase itself or
         another collaborative system)




       IDSIA, 01/06/2010   Davide Eynard
8

    Sanno's father: RDFMonkey [2]
                                                 Cities
Musical Artists




                                                 Books




             IDSIA, 01/06/2010   Davide Eynard
9

The problem

  •   We already have semantics on the annotation (i.e. Annotea), but
      how can we have semantics within the annotation?

  •   Good starting points:
        • Some participative systems already provide semi-structured
          information (i.e. infoboxes in Wikipedia)
        • Some communities of practice already built their own bottom-up
          way to structure information (i.e. machine tags)
        • Some (relatively new) systems allow, with some additional effort, to
          save information in a structured way almost without requiring users
          to know that (i.e. semantic wikis)


  •   Challenges
        • Provide a shared way to describe annotations coming from
          heterogeneous systems
        • Aggregate this information to provide something new and useful



        IDSIA, 01/06/2010   Davide Eynard
10

SAnno as a framework

  •   Sanno is built up of many different parts, which all together
      provide something (we consider) new and useful
        • An ontology to describe annotations (the “shells” that contain
          metadata about a resource)
        • An ontology describing the types of properties we are already able
          to aggregate
        • A set of conversion tools which are able to translate existing
          annotations from other systems into our notation
        • A system to show the results of the aggregation of different
          annotations
        • A system to manage provenance, authorship, and filters on
          incoming annotations




        IDSIA, 01/06/2010   Davide Eynard
11

The annotations ontology

  •   Every annotation can be considered as a “Post-it”, a piece of
      paper where something is written about something else
        • … you can say things about what is written there, but also about the
          Post-it itself


  •   The annotation is about a resource, it is created by someone in a
      specific date, it comes from a particular annotation system and
      might be connected to a specific community

  •   Main goal: do not reinvent everything from scratch
        • Reuse well-known ontologies such as DC, SIOC, etc.
        • Use named graphs as an alternative to reifications

  •   Start in an easy way: restriction to URLs
        • Also a way to provide instant gratification to users: show
          annotations while they are browsing a website


        IDSIA, 01/06/2010   Davide Eynard
12

The aggregation ontology

  •    Aggregation deals with the contents of the annotation (i.e. The
       triples found in the NG)

  •    Objectives
         • Avoid constraining users to a specific vocabulary for annotations
         • Find a way to collect different annotations and provide something
           new and interesting by aggregating them

       Our approach
         • Properties used inside annotations could be described as belonging
           to families we already know how to deal with
         • Examples: very specific (tags, ratings), more general (transitive
           relations)
         • Properties inside some external vocabulary are mapped as
           subproperties of ours
         • … by whom? High-experience users who have incentives to do this
           (think about users building templates in Wikipedia...)

         IDSIA, 01/06/2010   Davide Eynard
13

Conversion tools

  •     Our worst enemy: the bootstrap
          • who is going to annotate the first resources? I don't have time!


  •     Our best friends: already existing annotation systems
          • why don't we convert existing data to our notation and show the
            advantages of our approach?

       Different families of conversion tools
          • Easy: already existing APIs, with realtime search functionalities
            (i.e. del.icio.us)
          • Medium: conversions from existing structured repositories such as
            SPARQL endpoints (advantage: the conversion is very clean, you
            just need one tool and different CONSTRUCTs)
          • A little harder: Web scraping when no other sources are available




          IDSIA, 01/06/2010   Davide Eynard
14

Annotation client

  •   Actually, two possible clients in our mind:
        • a browser extension which shows annotations while users are
          browsing the Web
        • an independent service which is able to aggregate heterogeneous
          information related to similar resources (i.e. URLs marked as being
          MP3 files)

  •   Filter annotations according to author, date, originating system,
      and community
        • Users should be able to “subscribe” to some annotating
          communities and ignore others
        • System is thought as distributed, as data can come from different,
          unrelated sources




        IDSIA, 01/06/2010   Davide Eynard
15

The prototype

  •   Early annotation ontology
  •   Property families: tag, rating, generically related URI
  •   Conversions from SMW, Delicious
  •   Visualization as a web service + Firefox extension
  •   No subscriptions yet




        IDSIA, 01/06/2010   Davide Eynard
16

The prototype




      IDSIA, 01/06/2010   Davide Eynard
17

The end




                              Thank you! Questions?



 References:
 •   [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a
     hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007.
 •   [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic
     Annotation. Proceedings of SWUI 2008. April 2008.
 •   [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs
     Technical Report HPL-2008-197. September 2008.
 •   [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of
     heterogeneous sources to support data interpretation. HSI 2010. May 2010.

          IDSIA, 01/06/2010      Davide Eynard
Contact                        Davide Eynard

                                 Tel. 02 2399 4010
                                 Fax 02 2399 3411

                               eynard@elet.polimi.it
                               http://davide.eynard.it
                 Project page @AIRLab: http://airwiki.elet.polimi.it

          Back

More Related Content

SAnno: a unifying framework for semantic annotation

  • 1. SAnno: a unifying framework for semantic annotation Davide Eynard IDSIA, 01/06/2010
  • 2. 2 Introduction • S(emantic)Anno(tations) • … in Italy, “sanno” also means “they know” • Basic principle: anyone should be able to say anything about anything else • Well, this should hold in general :-) • Actually, in our case it is “anything about any URI” • And we would like everyone to say that in a formal way • But first, a little step back in time... IDSIA, 01/06/2010 Davide Eynard
  • 3. 3 Participation and semantics Data Structure IDSIA, 01/06/2010 Davide Eynard
  • 4. 4 Sanno's grandfather: Speakinabout [1] • Purpose: produce semantic annotations about named entities • When you read “Harry Potter”, is it the book or the movie? • Plays with user gratifications • When users annotate a string as matching a specific concept, they are shown a list of services/search engines which are related to it • Relies on user provided data: • Freebase types • User generated search templates, built inside a wiki system IDSIA, 01/06/2010 Davide Eynard
  • 5. 5 Sanno's grandfather: Speakinabout [1] IDSIA, 01/06/2010 Davide Eynard
  • 6. 6 Sanno's grandfather: Speakinabout [1] IDSIA, 01/06/2010 Davide Eynard
  • 7. 7 Sanno's father: RDFMonkey [2] • Purpose: augment browsing experience by providing information/services related to the visited URL • Relies on Freebase types • … as in SpeakinAbout, but without requiring user interaction • Types are found by searching backlinks in Freebase (which topics are linking the visited page) • Related services as widgets inside a browser extension • The app could load widgets at runtime (from Freebase itself or another collaborative system) IDSIA, 01/06/2010 Davide Eynard
  • 8. 8 Sanno's father: RDFMonkey [2] Cities Musical Artists Books IDSIA, 01/06/2010 Davide Eynard
  • 9. 9 The problem • We already have semantics on the annotation (i.e. Annotea), but how can we have semantics within the annotation? • Good starting points: • Some participative systems already provide semi-structured information (i.e. infoboxes in Wikipedia) • Some communities of practice already built their own bottom-up way to structure information (i.e. machine tags) • Some (relatively new) systems allow, with some additional effort, to save information in a structured way almost without requiring users to know that (i.e. semantic wikis) • Challenges • Provide a shared way to describe annotations coming from heterogeneous systems • Aggregate this information to provide something new and useful IDSIA, 01/06/2010 Davide Eynard
  • 10. 10 SAnno as a framework • Sanno is built up of many different parts, which all together provide something (we consider) new and useful • An ontology to describe annotations (the “shells” that contain metadata about a resource) • An ontology describing the types of properties we are already able to aggregate • A set of conversion tools which are able to translate existing annotations from other systems into our notation • A system to show the results of the aggregation of different annotations • A system to manage provenance, authorship, and filters on incoming annotations IDSIA, 01/06/2010 Davide Eynard
  • 11. 11 The annotations ontology • Every annotation can be considered as a “Post-it”, a piece of paper where something is written about something else • … you can say things about what is written there, but also about the Post-it itself • The annotation is about a resource, it is created by someone in a specific date, it comes from a particular annotation system and might be connected to a specific community • Main goal: do not reinvent everything from scratch • Reuse well-known ontologies such as DC, SIOC, etc. • Use named graphs as an alternative to reifications • Start in an easy way: restriction to URLs • Also a way to provide instant gratification to users: show annotations while they are browsing a website IDSIA, 01/06/2010 Davide Eynard
  • 12. 12 The aggregation ontology • Aggregation deals with the contents of the annotation (i.e. The triples found in the NG) • Objectives • Avoid constraining users to a specific vocabulary for annotations • Find a way to collect different annotations and provide something new and interesting by aggregating them  Our approach • Properties used inside annotations could be described as belonging to families we already know how to deal with • Examples: very specific (tags, ratings), more general (transitive relations) • Properties inside some external vocabulary are mapped as subproperties of ours • … by whom? High-experience users who have incentives to do this (think about users building templates in Wikipedia...) IDSIA, 01/06/2010 Davide Eynard
  • 13. 13 Conversion tools • Our worst enemy: the bootstrap • who is going to annotate the first resources? I don't have time! • Our best friends: already existing annotation systems • why don't we convert existing data to our notation and show the advantages of our approach?  Different families of conversion tools • Easy: already existing APIs, with realtime search functionalities (i.e. del.icio.us) • Medium: conversions from existing structured repositories such as SPARQL endpoints (advantage: the conversion is very clean, you just need one tool and different CONSTRUCTs) • A little harder: Web scraping when no other sources are available IDSIA, 01/06/2010 Davide Eynard
  • 14. 14 Annotation client • Actually, two possible clients in our mind: • a browser extension which shows annotations while users are browsing the Web • an independent service which is able to aggregate heterogeneous information related to similar resources (i.e. URLs marked as being MP3 files) • Filter annotations according to author, date, originating system, and community • Users should be able to “subscribe” to some annotating communities and ignore others • System is thought as distributed, as data can come from different, unrelated sources IDSIA, 01/06/2010 Davide Eynard
  • 15. 15 The prototype • Early annotation ontology • Property families: tag, rating, generically related URI • Conversions from SMW, Delicious • Visualization as a web service + Firefox extension • No subscriptions yet IDSIA, 01/06/2010 Davide Eynard
  • 16. 16 The prototype IDSIA, 01/06/2010 Davide Eynard
  • 17. 17 The end Thank you! Questions? References: • [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007. • [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic Annotation. Proceedings of SWUI 2008. April 2008. • [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs Technical Report HPL-2008-197. September 2008. • [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of heterogeneous sources to support data interpretation. HSI 2010. May 2010. IDSIA, 01/06/2010 Davide Eynard
  • 18. Contact Davide Eynard Tel. 02 2399 4010 Fax 02 2399 3411 eynard@elet.polimi.it http://davide.eynard.it Project page @AIRLab: http://airwiki.elet.polimi.it Back