Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SNSW Unit-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

SOCIAL

NETWORKS
&
SEMANTIC WEB
CO 1: The Semantic Web
WHAT IS MEANT BY SEMANTIC WEB?
 The Semantic Web is the application of advanced knowledge
technologies to the Web and distributed systems in general.
 The Semantic Web is the knowledge graph formed by combining
connected, Linked Data with intelligent content to facilitate
machine understanding and processing of content, metadata, and
other information objects at scale.
 The Semantic Web leads to smarter, more effortless customer
experiences by giving content the ability to understand and present
itself in the most useful forms matched to a customer’s need.
 Semantic standards unlock a crucial evolution of the web towards
intelligence that allows the content we post online to be presented
in a way that can be understood, connected, and remixed by
machines.
SEMANTIC WEB–CONTENT ENGINEERS
 Content engineers are creating a more powerful and agile web of
content and data by first parsing and structuring the discrete
elements of content that constitute websites, such as people,
events, ideas, concepts, products.
 These elements are then assigned a “label” describing its meaning
in a standardized language.
 When such machine-readable descriptions are present, they can be
linked to build a more robust web of data where computers can
find, read, and even reason about a unit of content.
SO WHAT IS THE NEED?
 But why would the Web need any extension or fixing? We will
argue that the reason we do not often raise this question is that we
got used to the limitations of accessing the vast information on the
Web.
LIMITATIONS OF THE CURRENT WEB
 To understand the laminations of current web, lets see how a
search engine will respond for 4 types of queries.
 Who is Frank van Harlem?
 Show me photos of Paris
 Find new music that I (might) like
 Tell me about music players with a capacity of at least 4GB.
LIMITATIONS
WHO IS FRANK VAN HARLEM?

 Only some of the results returned by Google for the search keyword
“Harlem” were what we are looking for because
 It’s the name of a number of people, including the (unrelated) Frank van
Harmelen and Mark van Harmelen.
 Harmelen is also a small town in the Netherlands (one hit) and the place
for a tragic train accident (one hit).
 The keyword harmelen (but even the term Frank van Harmelen) is
polysemous.
 Search engines are programmed in such a way that the first page
shows a diversity of the most relevant links related to the keyword.
LIMITATIONS
WHO IS FRANK VAN HARLEM?

 This allows the user to quickly realize the ambiguity of the query
and to make it more specific.
 But making the query specific will solve the issue? May be not.
LIMITATIONS
SHOW ME PHOTOS OF PARIS

 The most straightforward solution to this search task is typing in “paris


photos” in the search bar of our favorite search engine. Most advanced
search engines, however, have specific facilities for image search where
we can drop the term photo from the query.
 What we immediately notice from the results is that the search engine
fails to discriminate two categories of images: those related to the city of
Paris and those showing Paris Hilton, the heiress to the Hilton fortune
whose popularity on the Web could hardly be disputed.
 Only about half of the photos on the first page, a quarter of the photos on
the second page and a fifth on the third page are directly related to our
concept of Paris.
 The problem is that associating photos with keywords is a much more
difficult task than simply looking for keywords in the texts of documents.
LIMITATIONS
FIND NEW MUSIC THAT I (MIGHT) LIKE

 This query is at an even higher level of difficulty so much so that


most of us wouldn’t even think of posing it to a search engine.
 First, from the perspective of automation, music retrieval is just as
problematic as image search.
 As in the previous case, a search engine could avoid the problem of
understanding the content of music and look at the filename and the text
of the web page for clues about the performer or the genre.
 The reason we would not attempt to pose this query mostly has to
do with formulating the music we like.
LIMITATIONS
FIND NEW MUSIC THAT I (MIGHT) LIKE
 A description of our musical taste is something that we might list
on our homepage but it is not something that we would like to keep
typing in again for accessing different music-related services on the
internet.
 Ideally, we would like the search engine to take this information
from our homepage or to grab it —with our permission— from
some other service that is aware of our musical taste such as our
online music store, internet radio stations we listen to or the music
player of our own mp3 device.
MUSIC PLAYERS WITH A CAPACITY
OF AT LEAST 4GB.
 This is a typical e-commerce query: we are looking for a product with
certain characteristics.
 One of the immediate concerns is that translating this query from
natural language to the boolean language of search engines is (almost)
impossible.
 We could try the search “music player” “4GB” but it is clear that the
search engine will not know that 4GB is the capacity of the music player
and we are interested in all players with at least that much memory.
 Such a query would return only pages where these terms occur as they
are.
 Problem is that general purpose search engines do not know anything
about music players or their properties and how to compare such
properties.
MUSIC PLAYERS WITH A CAPACITY
OF AT LEAST 4GB.
 An even bigger problem is the one our machines face when trying
to collect and aggregate product information from the Web. Again,
a possibility would be to extract this information from the content
of web pages. The information extraction methods used for this
purpose have a very difficult task and it is easy to see why if we
consider how a typical product description page looks like to the
eyes of the computer.
 Further, what one vendor calls “capacity” and another may call
“memory”.
THE CURRENT WEB HAS ITS LIMITATIONS WHEN IT COMES TO:

 finding relevant information


 extracting relevant information

 combining and reusing information

 another problem is that the keyword is polysemous

 Security
DIAGNOSIS FOR THESE LIMITATIONS

 The questions above are arbitrary in their specificity but they illustrate a general
problem in accessing the vast amounts of information on the Web.
 Namely, in all four cases we deal with a knowledge gap: what the computer
understands and able to work with is much more limited than the knowledge of
the user.
 The handicap of the computer is mostly due to technological difficulties in getting
our computers to understand natural language or to “see” the content of images
and other multimedia.
 In most cases, however, the knowledge gap is due to the lack of some kind of
background knowledge that only the human possesses.
 The background knowledge is often completely missing from the context of the
Web page.
 Further, a query may need aggregated knowledge.
THE SEMANTIC SOLUTION
 The idea of the Semantic Web is to apply advanced knowledge technologies in
order to fill the knowledge gap between human and machine.
 This means providing knowledge in forms that computers can readily process and
reason with.
 This knowledge can either be information that is already described in the content
of the Web pages but difficult to extract or additional background knowledge that
can help to answer queries in some way.
 More importantly the user profiles plays a very important role in Semantic Web.
 Increasing automatic linking among data
 Increasing recall and precision in search
 Increasing automation in data integration
 Increasing automation in the service life cycle
THE
SEMANTIC
SOLUTION
WHO IS FRANK
VAN HARLEM?
THE SEMANTIC SOLUTION
WHO IS FRANK VAN HARLEM?
 The situation can be greatly improved by providing personal
information in a semantic format.
 A semantic profile to personal web pages that describe the same
information that appears in the text of the web page but in a
machine processable format.
 Assuming that all van Harmelens on the Web would provide
similar information, the confusion among them could also be easily
avoided.
THE SEMANTIC SOLUTION
WHO IS FRANK VAN HARLEM?
 In particular, the search engine could alert us to the ambiguity of
our question and ask for some extra information about the person
we are looking for.
 Also, a better understanding of the user profile, the search query
and the content of the web pages makes it possible to more
accurately select and customize the advertisements appearing
alongside the queries
THE SEMANTIC
SOLUTION
SHOW ME
PHOTOS OF PARIS
FIND NEW MUSIC THAT I (MIGHT)
LIKE
 Here also, attach metadata to the images in question.
 For example, the online photo sharing site Flickr allows to
annotate images using geographic coordinates.
 After uploading some photos users can add keywords to describe
their images (e.g. “Paris, Eiffel-tower”) and drag and drop the
images on a geographic map to indicate the location where the
photo was taken.
FIND NEW MUSIC THAT I (MIGHT)
LIKE
 Like in case of images, the same technique of annotating images
with metadata is used in the MultiMediaN research project to
create a joint catalogue of artworks housed in different collections.
THE
SEMANTIC
SOLUTION
FIND NEW
MUSIC THAT I
(MIGHT) LIKE
FIND NEW MUSIC THAT I (MIGHT)
LIKE

 The knowledge provided extends to the content of images as well as other


attributes such as the creator, style or material of the work.
 When these annotations are combined with existing background knowledge
about the artists, art periods, styles etc., it becomes possible to aggregate
information and to search all collections simultaneously while avoiding the
pitfalls of keyword based search.
FIND NEW MUSIC THAT I (MIGHT)
LIKE

 The background knowledge required for recommending music is already at


work behind the online radio called Pandora.
 Pandora is based on the Music Genome Project, an attempt to create a
vocabulary to describe characteristics of music from melody, harmony and
rhythm, to instrumentation, orchestration, arrangement, lyrics, and the
rich world of singing and vocal harmony.
THE SEMANTIC SOLUTION
TELL ME ABOUT MUSIC PLAYERS WITH A CAPACITY OF AT LEAST
4GB.
TELL ME ABOUT MUSIC PLAYERS
WITH A CAPACITY OF AT LEAST 4GB.

 As we have seen the problem in this case is the difficulty of maintaining a


unified catalog in a way that does not require an exclusive commitment
from the providers of product information.
 Further, we would like to keep the catalogue open to data providers adding
new, emerging categories of products and their descriptions
THE SEMANTIC SOLUTION

 The Semantic Web solution is to create a minimal, shared, top-


level schema in one of the ontology languages defined for the
Semantic Web.
 Ontology means describing the semantics of the data, providing
a uniform way to enable communication by which different
parties can understand each other.
 Designed for extensibility in the distributed environment of the
Web.
THE SEMANTIC SOLUTION

 This means that new characteristics and entire subcategories of products


can be independently introduced by vendors.
 These vendor-specific extensions will be understood to the extent that they
are described in terms of the existing elements of the shared schema.
 For example, mappings between the product classifications of independent
vendors allows the products classified by one classification to be retrieved
according to an alternative classification.
 The mappings themselves can be developed and maintained by any of the
two parties or even by an independent, third party.
THE SEMANTIC SOLUTION

 In summary, in all the scenarios we have sketched above the the addition
of knowledge in machine-processable languages has the potential to
improve the access to information by clarifying the meaning of the content.
 Besides information retrieval, understanding the meaning of information is
also an important step towards aggregating information from multiple
heterogeneous sources
 Aggregation is in turn necessary for performing queries, analysis and
reasoning across several information sources as if they would form a single
uniform database.
RESEARCH, DEVELOPMENT AND
STANDARDIZATION
 The vision of extending the current human-focused Web with
machine processable descriptions of web content has been first
formulated in 1996 by Tim Berners-Lee, the original inventor of
the Web.
 The Semantic Web has been actively promoted since by the World
Wide Web Consortium (also led by Berners-Lee), the organization
that is chiefly responsible for setting technical standards on the
Web.
RESEARCH, DEVELOPMENT AND
STANDARDIZATION
 As a result of this initial impetus and the expected benefits of a
more intelligent Web, the Semantic Web has quickly attracted
significant interest from funding agencies on both sides of the
Atlantic, reshaping much of the AI research agenda in a
relatively short period of time.
 In particular, the field of Knowledge Representation and
Reasoning took center stage, but outcomes from other fields of
AI have also been put into to use to support the move towards the
Semantic Web: for example, Natural Language Processing and
Information Retrieval have been applied to acquiring knowledge
from the World Wide Web.
RESEARCH, DEVELOPMENT AND
STANDARDIZATION
 As the Semantic Web is a relatively new, dynamic field of
investigation, it is difficult to precisely delineate the boundaries of
this network.
 Around 600+ researchers till 2004 worked in this area.

 The core technology of the Semantic Web, logic-based languages


for knowledge representation and reasoning have been
developed in the research field of Artificial Intelligence.
RESEARCH, DEVELOPMENT AND
STANDARDIZATION
 As the potential for connecting information sources on a Web-scale
emerged, the languages that have been used in the past to describe
the content of the knowledge bases of stand-alone expert systems
have been adapted to the open, distributed environment of the
Web.
 Since the exchange of knowledge in standard languages is crucial
for the interoperability of tools and services on the Semantic Web,
these languages have been standardized by the W3C as a layered
set of languages.
RESEARCH, DEVELOPMENT AND
STANDARDIZATION
 Tools for creating, storing and reasoning with ontologies have been
primarily developed by university-affiliated technology startups
and at research labs of large corporations.
 Most of these tools are available as open source

 The World Wide Web Consortium still plays a key role in


standardization where the interoperability of tools necessitates
mediation between various developer and user communities, as in
the case of the development of a standard query language and
protocol to access ontology stores across the Web.
WEB -
TECHNOLOGY ADAPTION
 The Semantic Web was originally conceptualized as an extension of
the current Web, i.e. as the application of metadata for describing
Web.
 The content that is already on the Web (text, but also multimedia)
would be enriched in a collaborative effort by the users of the Web.
content.
 Is it enough?

 The designers have to think beyond this.


WEB -
TECHNOLOGY ADAPTION
 The alternative view predicted that the Semantic Web will first
break through behind the scenes and not with the ordinary users,
but among large providers of data and services.
 The second vision predicts that the Semantic Web will be primarily
a “web of data” operated by data and service providers largely
unknown to the average user.
WEB -
TECHNOLOGY ADAPTION: ISSUES
 As a technology for developers, users of the Web never experience the
Semantic Web directly, which makes it difficult to convey Semantic Web
technology to stakeholders.
 Most of the times the gains for developers are achieved over the long term.

 As many other modern technologies, the Semantic Web suffers from what
the economist Kevin Kelly calls the fax-effect (Fax Machines value
increased as they are more purchased unlike the traditional goods like land
or metals like gold).
WEB -
TECHNOLOGY ADAPTION: ISSUES
 With the Semantic Web: at the beginning the price of technological
investment is very high.
 Semantic web with formal knowledge typically captures only the smaller
part of the intended meaning and thus there needs to be a common
grounding in an external reality that is shared by those at separate ends of
the line.
 While the research effort behind the Semantic Web is immense and
growing dynamically, Semantic Web technology has yet to see mainstream
use on the Web and in the enterprise.
THE EMERGENCE OF SOCIAL WEB
 The web of the 1990s was much like the combination of a phone
book and the yellow pages and despite the connecting power of
hyperlinks it instilled little sense of community among its users.
 This passive attitude toward the Web was broken by a series of
changes in usage patterns and technology that are now referred
to as Web 2.0
 The first wave of socialization on the Web was due to the
appearance of blogs, wikis and other forms of web-based
communication and collaboration.
 Blogs and wikis attracted mass popularity from around 2003
THE EMERGENCE OF SOCIAL WEB
 What they have in common is that they both significantly lower the
requirements for adding content to the Web: editing blogs and wikis
did not require any knowledge of HTML any more.
 Blogs and wikis allowed individuals and groups to claim their
personal space on the Web and fill it with content at relative ease.
 Even more importantly, despite that weblogs have been first assessed
as purely personal publishing (similar to diaries), nowadays the
blogosphere is widely recognized as a densely interconnected social
network through which news, ideas and influences travel rapidly as
bloggers reference and reflect on each other’s postings.
THE EMERGENCE OF SOCIAL WEB
 The first online social networks (also referred to as social
networking services) entered the field at the same time as
blogging and wikis started to take off.
 In 2003, the first-mover “Friendster” attracted over five million
registered users in the span of a few months, which was
followed by Google and Microsoft starting or announcing similar
services.
 Wiki vs Social Networks?
THE EMERGENCE OF SOCIAL WEB
 Although the example of Wikipedia, the online encyclopedia is
outstanding, wikis large and small are used by groups of various
sizes as an effective knowledge management tool for keeping
records, describing best practices or jointly developing ideas.
 the collective ownership of a Wiki enforces a sense of community
through the necessary discussions over shared content.
THE EMERGENCE OF SOCIAL WEB
 Similarly, the significance of instant messaging (ICQ) is also not
just instant communication (phone is instantaneous, and email
is almost instantaneous), but the ability to see who is online, a
transparency that induces a sense of social responsibility.
 Although the newly introduced social web sites feature much of
the same content that appear on personal web pages, they
provide a central point of access and bring structure in the
process of personal information sharing and online socialization
THE EMERGENCE OF SOCIAL WEB
 The system also makes it possible to visualize and browse the
resulting network in order to discover friends in common,
friends thought to be lost or potential new friendships based on
shared interests.
 These vastly popular systems allow users to maintain large
networks of personal and business contacts. Members have soon
discovered, however, that networking is only a means to an end
in the cyberspace as well.
THE EMERGENCE OF SOCIAL WEB
 The idea of network based exchange is based on the sociological
observation that social interaction creates similarity and vice
versa, interaction creates
 similarity: friends are likely to have acquired or develop similar
interests.
 Explicit user profiles make it possible for these systems to
introduce rating mechanism whereby either the users or their
contributions are ranked according to usefulness or
trustworthiness.
THE EMERGENCE OF SOCIAL WEB
 The design and implementation of Web applications have also
evolved in order to make the user experience of interacting with
the Web as smooth as possible.
 In line with user friendliness, what can be also observed is a
preference for formats, languages and protocols that are easy to
use and develop with, in particular script languages, formats
such as JSON, protocols such as REST.
 This is to support rapid development and prototyping
WEB 2.0 + SEMANTIC WEB = WEB
3.0?
 Web 2.0 is often contrasted to the Semantic Web, which is a
more conscious and carefully orchestrated effort on the side of
the W3C to trigger a new stage of developments using semantic
technologies.
 Web 2.0 mostly effects how users interact with the Web

 The Semantic Web opens new technological opportunities for


web developers in combining data and services from different
sources.
WEB 2.0 + SEMANTIC WEB = WEB
3.0?
 users are willing to provide content as well as metadata, this
may take the form articles and facts organized in tables and
categories in Wikipedia, photos organized in sets and according
to tags.
 Further, web pages created automatically from a database can
encode metadata in microformats without the user necessarily
being aware of it.
WEB 2.0 + SEMANTIC WEB = WEB
3.0?
 Due to the extensive collaborations online many applications
have access to significantly more metadata about the users.
 Information about the choices, preferences, tastes and social
networks of users means that the new breed of applications are
able to build on a much richer user profiles.
 social-semantic systems that can provide recommendations
based on both the social network of users and their personal
profiles are likely to outperform traditional recommender
systems as well as purely network-based trust mechanisms.
WEB 2.0 + SEMANTIC WEB = WEB
3.0?
 Interms of technology what the Semantic Web can offer to the
Web 2.0 community is a standard infrastructure for the building
creative combinations of data and services

You might also like