This document introduces SAnno, a framework for semantic annotation that allows anything to be annotated about any URI. It summarizes SAnno's predecessors like Speakinabout and RDFMonkey that relied on user-generated data and Freebase types. SAnno aims to provide a shared way to describe annotations from different systems and aggregate this information using an ontology for annotations and their properties. The framework includes tools to convert existing annotations into its notation and a prototype client to visualize aggregated annotations from a web service or browser extension.
1 of 18
More Related Content
SAnno: a unifying framework for semantic annotation
1. SAnno: a unifying framework for semantic annotation
Davide Eynard
IDSIA, 01/06/2010
2. 2
Introduction
• S(emantic)Anno(tations)
• … in Italy, “sanno” also means “they know”
• Basic principle: anyone should be able to say anything about
anything else
• Well, this should hold in general :-)
• Actually, in our case it is “anything about any URI”
• And we would like everyone to say that in a formal way
• But first, a little step back in time...
IDSIA, 01/06/2010 Davide Eynard
4. 4
Sanno's grandfather: Speakinabout [1]
• Purpose: produce semantic annotations about named entities
• When you read “Harry Potter”, is it the book or the movie?
• Plays with user gratifications
• When users annotate a string as matching a specific concept, they
are shown a list of services/search engines which are related to it
• Relies on user provided data:
• Freebase types
• User generated search templates, built inside a wiki system
IDSIA, 01/06/2010 Davide Eynard
7. 7
Sanno's father: RDFMonkey [2]
• Purpose: augment browsing experience by providing
information/services related to the visited URL
• Relies on Freebase types
• … as in SpeakinAbout, but without requiring user interaction
• Types are found by searching backlinks in Freebase (which topics
are linking the visited page)
• Related services as widgets inside a browser extension
• The app could load widgets at runtime (from Freebase itself or
another collaborative system)
IDSIA, 01/06/2010 Davide Eynard
9. 9
The problem
• We already have semantics on the annotation (i.e. Annotea), but
how can we have semantics within the annotation?
• Good starting points:
• Some participative systems already provide semi-structured
information (i.e. infoboxes in Wikipedia)
• Some communities of practice already built their own bottom-up
way to structure information (i.e. machine tags)
• Some (relatively new) systems allow, with some additional effort, to
save information in a structured way almost without requiring users
to know that (i.e. semantic wikis)
• Challenges
• Provide a shared way to describe annotations coming from
heterogeneous systems
• Aggregate this information to provide something new and useful
IDSIA, 01/06/2010 Davide Eynard
10. 10
SAnno as a framework
• Sanno is built up of many different parts, which all together
provide something (we consider) new and useful
• An ontology to describe annotations (the “shells” that contain
metadata about a resource)
• An ontology describing the types of properties we are already able
to aggregate
• A set of conversion tools which are able to translate existing
annotations from other systems into our notation
• A system to show the results of the aggregation of different
annotations
• A system to manage provenance, authorship, and filters on
incoming annotations
IDSIA, 01/06/2010 Davide Eynard
11. 11
The annotations ontology
• Every annotation can be considered as a “Post-it”, a piece of
paper where something is written about something else
• … you can say things about what is written there, but also about the
Post-it itself
• The annotation is about a resource, it is created by someone in a
specific date, it comes from a particular annotation system and
might be connected to a specific community
• Main goal: do not reinvent everything from scratch
• Reuse well-known ontologies such as DC, SIOC, etc.
• Use named graphs as an alternative to reifications
• Start in an easy way: restriction to URLs
• Also a way to provide instant gratification to users: show
annotations while they are browsing a website
IDSIA, 01/06/2010 Davide Eynard
12. 12
The aggregation ontology
• Aggregation deals with the contents of the annotation (i.e. The
triples found in the NG)
• Objectives
• Avoid constraining users to a specific vocabulary for annotations
• Find a way to collect different annotations and provide something
new and interesting by aggregating them
Our approach
• Properties used inside annotations could be described as belonging
to families we already know how to deal with
• Examples: very specific (tags, ratings), more general (transitive
relations)
• Properties inside some external vocabulary are mapped as
subproperties of ours
• … by whom? High-experience users who have incentives to do this
(think about users building templates in Wikipedia...)
IDSIA, 01/06/2010 Davide Eynard
13. 13
Conversion tools
• Our worst enemy: the bootstrap
• who is going to annotate the first resources? I don't have time!
• Our best friends: already existing annotation systems
• why don't we convert existing data to our notation and show the
advantages of our approach?
Different families of conversion tools
• Easy: already existing APIs, with realtime search functionalities
(i.e. del.icio.us)
• Medium: conversions from existing structured repositories such as
SPARQL endpoints (advantage: the conversion is very clean, you
just need one tool and different CONSTRUCTs)
• A little harder: Web scraping when no other sources are available
IDSIA, 01/06/2010 Davide Eynard
14. 14
Annotation client
• Actually, two possible clients in our mind:
• a browser extension which shows annotations while users are
browsing the Web
• an independent service which is able to aggregate heterogeneous
information related to similar resources (i.e. URLs marked as being
MP3 files)
• Filter annotations according to author, date, originating system,
and community
• Users should be able to “subscribe” to some annotating
communities and ignore others
• System is thought as distributed, as data can come from different,
unrelated sources
IDSIA, 01/06/2010 Davide Eynard
15. 15
The prototype
• Early annotation ontology
• Property families: tag, rating, generically related URI
• Conversions from SMW, Delicious
• Visualization as a web service + Firefox extension
• No subscriptions yet
IDSIA, 01/06/2010 Davide Eynard
17. 17
The end
Thank you! Questions?
References:
• [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a
hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007.
• [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic
Annotation. Proceedings of SWUI 2008. April 2008.
• [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs
Technical Report HPL-2008-197. September 2008.
• [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of
heterogeneous sources to support data interpretation. HSI 2010. May 2010.
IDSIA, 01/06/2010 Davide Eynard