Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Subject Access, Smart Data, and
Digital Humanities
– Finding Unlimited Opportunities through
their Intersections
Marcia Lei Zeng
Kent State University
Keynote @ IFLA Classification & Indexing Satellite Conference
2016 August 11-12, Columbus, OH, USA
http://marciazeng.slis.kent.edu/
Outline
• I. Background
• II. Subject Access -- Finding the Unlimited
Opportunities
• III. The Importance of Knowledge
Organization Systems (KOS) for Effective
Subject Access
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
28/11/16
I. Background
What do I mean…
1) Subject access
– in the context of today’s environments
2) Smart data
– in the context of Big Data
3) Digital Humanities
– in the context of heritage institutions’ data
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
38/11/16
What is happening around us?
• The 2nd generation of the Web:
the Semantic Web
– Search engines involvement,
– mature of the Linked Data technologies,
– non-traditional databases
• “Big Data”
– Government funding opportunities,
– Blooming of ‘data analytics’ profession
• Modern AI (artificial intelligence)
– Machine-learning
– Contextual computing
• Participatory culture
– Social media
– Engaging end-users in the workflow
M Zeng - IFLA Classification & Indexing Satellite Conference 2016 4
I. Background
https://www.w3.org/2002/Talks/www2002-w3ct-swintro-em/slide7-0.html
8/11/16
Source: Nova Spivak, Radar Networks; John Breslin, DERI; & Mills Davis, Project10X,
2007, 2008 Copyright MILLS•DAVIS
5
Web 1.0: connecting information and getting on the net.
Web 2.0: connecting people — putting the “I” in user interface, and the “we” into Webs of social participation.
Web 3.0 Connecting knowledge -- representing meanings, connecting knowledge, and putting these to work in ways that make our
experience of internet more relevant, useful, and enjoyable.
Web 4.0 Connecting intelligence -- It is about connecting intelligences in a ubiquitous Web where both people and things reason and
communicate together.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
6
Big data
• Volume (data quantity)
• Velocity (data speed)
• Variety (data types &
nature)
• Variability (data
consistency)
• Veracity (data quality)
• Complexity
Source: Kobielus, James. 2016. The Evolution of Big Data to Smart Data. Keynote at Smart Data Online 2016.
Source: Big Data. Wikipedia.
SAS Institute Inc. [2014]. Big Data:
What it is and why it matters.
Smart Data
= Ability to achieve big
insights from such data
at any scale, great or small.
I. Background
8/11/16
Why Smart Data
• “However, in its raw form, data is just like crude oil; it needs to be refined and
processed in order to generate real value. Data has to be cleaned, transformed, and
analyzed to unlock its hidden potential.” (TiECON East. Data is new oil.)
• Once tamed through organizing and integrating processes, large volumes of
unstructured, semi-structured, and structured data are turned into “smart data” that
reflect the research priorities of a particular discipline or field.
• Smart data inquiries can then be used to provide comprehensive analyses and
generate new products and services.
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
7
Sources:
Gardner, D, 2012.
Prithwis Mukerjee, 2014
Schöch, 2013.
TiECON East, 2014.
8/11/16
What can we do to avoid asthma episode?
8
Real-time health signals from personal level (e.g., Wheezometer, NO in breath,
accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level
(e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing
information and uneven sampling frequencies.
Variety Volume
VeracityVelocity
Value
What risk factors influence asthma control?
What is the contribution of each risk factor?
semantics
WHY Big Data to Smart Data: Asthma example
Slide from: Sheth, Amit. 2014. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity
using semantics and Semantic Web.
Understanding relationships between
health signals and asthma attacks
for providing actionable information
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
9
Data
(in humanities)
Big data
unstructured
messy
implicit
relatively large in
volume
varied in form
Smart data
semi- structured
or structured
Clean
Explicit and
enriched
Raw data
+ markup, annotations
and metadata
Relatively small in
volume
The creation involves
human agency & demands
time
The process of modeling
the data is essentialOf limited
heterogeneity
Complied based on Schöch, Christof. 2013. Big? Smart?
Clean? Messy? Data in the humanities. Journal for Digital
Humanities. 2(3)
What about LAMs?
8/11/16
Structured
Semi-structured
Unstructured
10
• National bibliographies
• Catalogs
• Special collection portals
• Registries
• Metadata for datasets
• …
• Text Encoding Initiative (TEI) files
• Finding Aids
• Value added/tagged resources
• Unstructured portion within metadata
descriptions
• Digitized materials, textual or
non-textual
• Original information-bearing
objects
• Documents in all kinds of formats
• …
• Data from Web crawling that
need to be cleaned
• … …
LAM data examples
“Smart Data” emphasizes the
organizing and integrating processes
from unstructured data to structured
and semi-structured data, to make the
big data smarter.
- Schöch, 2013
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
• the field is still expanding,
• the definitions are being debated, and
• the multifaceted landscape is yet to be fully
understood.
• Most agree that initiatives and activities in digital
humanities are at the intersection between the
humanities and digital information technology.
• The field applies big data mathematical research
techniques to the description and analysis of cultural
objects—including art, literature, and technological
artifacts themselves.
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
11
Image source: Katherine Hayles
http://dtc-wsuv.org/wp/dtc375-
scodi/katherine-hayles/
• Svensson, P. 2010. The Landscape of Digital Humanities. Digital
Humanities Quarterly. 4(1).
• Svensson, P. 2009. Humanities Computing as Digital Humanities.
Digital Humanities Quarterly. 3(3)
I. Background
8/11/16
Advanced technologies now allow researchers :
(under the umbrella of Big Data and the Semantic Web)
• to access and reuse large volumes of diverse data,
• to discover patterns and connections formerly hidden
from view,
• to reconstruct the past,
• to discover impacts in real and virtual environments, and
• to bring the complex intricacies of innovations to light,
all as never before.
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
12
Image source: http://goo.gl/a4gZsd
Image source: Schöch, 2013.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
13
Think:
• What kind of data did the project use?
Data sources:
• Freebase (now Wikidata)
• Union List of Artist Names (ULAN®)
• Allgemeines Künstlerlexikon/ Artists of the World
Schich, M. et al. 2014. “A Network Framework of Cultural History.”
Science, 345(6196), 558-562.
Nature Video. (2014, July 31).
Charting culture.
https://www.youtube.com/w
atch?v=4gIhRkCcD4U
8/11/16
Advanced technologies now allow researchers :
(under the umbrella of Big Data and the Semantic Web)
• to access and reuse large volumes of diverse data,
• to discover patterns and connections formerly hidden
from view,
• to reconstruct the past,
• to discover impacts in real and virtual environments, and
• to bring the complex intricacies of innovations to light,
all as never before.
Data provided by LAMs and cultural heritage institutions are
treasures for all humanities researchers.
Trending:
• Machine readable understandable data
• Machine readable actionable data
• Accurate (no error) data in the processes of interlinking,
citing, transferring, rights-permission, use and reuse, etc.
• One –to -many uses and high efficiency processing data
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
14
http://goo.gl/a4gZsd
8/11/16
15
Digital humanities – Librarian Survey Results, December 2015
http://americanlibrariesmagazine.org/2016/01/04/special-report-digital-humanities-libraries/
Source: http://americanlibrariesmagazine.org/wp-content/uploads/2016/01/digital-humanities-faculty.pdf
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
16
Digital humanities – Faculty Survey Results, December 2015
Source: http://americanlibrariesmagazine.org/wp-content/uploads/2016/01/digital-humanities-faculty.pdf8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
Structured
Semi-structured
Unstructured
17
• National bibliographies
• Catalogs
• Special collection portals
• Registries
• Metadata for datasets
• …
• Text Encoding Initiative (TEI files)
• Finding Aids
• Value added/tagged resources
• Unstructured portion within metadata
descriptions
• Digitized materials, textual or
non-textual
• Original information-bearing
objects
• Documents in all kinds of formats
• …
• Data from Web crawling that
need to be cleaned
• … …
LAM data examples
II. Subject Access
-- Finding the Unlimited Opportunities
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
Figure. Overview of Relationships (draft)
Source: http://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf + revision draft
FRBR-Library Reference Model (LRM)
- World-wide review version
RES:“Any entity in the universe of discourse”
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
18
liked, cited, researched, tagged, searched,
shared, followed, time spent, …
Three Perspectives
-- from the creation of the structured data
based on Rose, Gillan. 2013.
Visual Methodologies, 3rd. Ed.
• Index
• Markup
• Ontology
• Knowledge
base
• Metadata
• Descriptive
• Administrative
• Structural/techni
cal
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
19
1 2
3
Production Content
Audiences’ receiving interests
Image sources:
http://www.mahalo.com/how-to-
understand-perspective-in-drawing/
http://judithlondono.com/
http://www.smrfoundation.org/nodexl/
8/11/16
20
Read more: Godby, Wang, Mixter. 2015. Library Linked Data in the Cloud –
OCLC’s Experiments with New Models of Resource Description. ISBN
9781627052191.
Figure 1.1: A bibliographic description as a record and a graph.
WorldCat Linked Data
https://www.oclc.org/developer/develop/linked-
data.en.html
OCLC WorldCat Works
– 197 Million Nuggets of Linked Data
-- Since 2014-
“The bibliographic metadata found
in WorldCat contains a rich set of
objects that can be represented in
linked data.”
Linking THINGS, not strings.
Access through the linked things.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
21
BibFrame
- based
Read: https://www.denverlibrary.org/blog/rachel-f/dpl-announces-linked-data-launch 2015-06
Try: http://labs.libhub.org/denverpl/
Linking THINGS, not strings.
Access through the linked things.
8/11/16
22
http://www.worldcat.org/oclc/922220005
http://worldcat.org/entity/work/id/2534768
http://worldcat.org/entity/person/id/2631227899
Linking out
Schema.org- based
Linking THINGS, not strings.
Access through the linked things.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
23
Go to http://dbpedia.org/page/Lois_Mai_Chan and
follow knownFor and about
Linking THINGS, not strings.
Access through the linked things.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
24
The connected structured data for THINGs from
Perspectives #2 and #3:
2
3
Linking THINGS, not strings.
Access through the linked things.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
Structured
Semi-structured
Unstructured
25
• National bibliographies
• Catalogs
• Special collection portals
• Registries
• Metadata for datasets
• …
• Text Encoding Initiative (TEI files)
• Finding Aids
• Value added/tagged resources
• Unstructured portion within metadata
descriptions
• Digitized materials, textual or
non-textual
• Original information-bearing
objects
• Documents in all kinds of formats
• …
• Data from Web crawling that
need to be cleaned
• … …
LAM data examples
There are many hidden access
points that can bring in much
richer information and
knowledge through LAM data.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
Big text in humanities
“Big text”
– The text version of “big data”
– Where?
• special collections,
• archives,
• oral histories,
• annual reports,
• provenance indexes,
• inventories,
• … etc.
– How?
• Fact mining, analytics
– What is needed?
Tools
• to ‘mine’ the text,
• to manage extracted
entities as new access
points, and
• to connect with the
outside data.
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
26
Source: SmarkLogic webinar
http://www.marklogic.com/w
ebinars/
8/11/16
Audiences’ receiving interests
liked, cited, researched, tagged, searched,
shared, followed, time spent, …
• Index
• Markup
• Ontology
• Knowledge
base
• Metadata
• Descriptive
• Administrative
• Structural/techni
cal
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
27
1 2
3
Production Content
2). Taking archival finding aids as an example
Image source:
http://alelemuseum.tripod.com/Archive
s.html
Image source:
https://libraries.u
sc.edu/article/ins
ide-usc-libraries-
grand-avenue-
library
8/11/16
• Finding aids
• Provide detailed descriptions of a collection's component parts,
• summarize the overall scope of the content,
• convey details about the individuals and organizations involved,
• list box and folder headings.
• The ‘subject’ access is to the whole archive (=Perspective #1)
• Few provided accesses to the ‘things’ contained in the contents
through index terms.
[Images from a finding aids: title page, content page, and index terms page.
Image source:
https://libraries.u
sc.edu/article/ins
ide-usc-libraries-
grand-avenue-
library8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
28
Finding Aids
semantic analysis
using ontology-based tools
by KSU SLIS LOD-LAM team
http://lod-
lam.slis.kent.edu/SemanticAnalysis.html
• 45 archival finding aids
• drawn from 16 repositories
• From OpenCalais: extracted 8,096 entities and 336
suggested social tags
29
OpenCalais and COGITO are
• semantic analysis/fact-mining tools,
• taxonomy and ontology-supported,
• with machine learning and natural language processing
behind.8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
after
before
30
http://www.opencalais.com/
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
Structured data
produced by Calais
(RDF/XML)
31
http://www.opencalais.com/
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
before
after
32
http://www.intelligenceapi.com
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
after
after
Structured
data for
THINGs
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
33
before
http://www.intelligenceapi.com
8/11/16
34
for enhancing access to the contents in oral history transcripts files.
• Currently many are managed at collection level only.
• Only some have deep indexes, with great quality.
• The indexes usually existed as a ‘back-of-the-book’ style
and stayed within PDF files, downloadable.
• The indexes could be used in the
collection's subject searching.
• Indexed THINGs could be linked to
external resources.
The same approach can be used for the oral history collections
[image of a back-of-the-book style index
to an oral history transcripts]
[image of a page of the oral history
transcripts]
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
35
Tool used: Open Calais
Note: Only for assistant extraction; still need
human cleaning process.
The same approach can be used for the library catalogs
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
36
The same approach can be used for the museum object labels and descriptions
COGITOOpenCalais
http://www.metmuseum.org/toah/works-of-art/2010.312/
8/11/16
AfterBosonBefore http://bosonnlp.com/demo
entities
keywords
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
37
Audiences’ receiving interests
liked, cited, researched, tagged, searched,
shared, followed, time spent, …
• Index
• Markup
• Ontology
• Knowledge
base
• Metadata
• Descriptive
• Administrative
• Structural/techni
cal
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
38
1 2
3
Production Content
3) Taking non-textual objects as examples
8/11/16
Portrait of Marcus Aurelius
Online Coins of the Roman Empire (OCRE) - Ontology based, knowledge base
http://numismatics.org/ocre/results
• Modeling in an ontology (formed in classes,
properties, relationships)
• Following Linked Data principles
• Using RDF triples for entities
• Querying in SPARQL language
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
39
40
Online Coins of the Roman Empire (OCRE)
http://numismatics.org/ocre/
• Using sparql queries to find
• Output as CSV files
• Auto-Visualizing using FusionTable
• Just needs a few seconds
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
41
Online Coins of the Roman Empire (OCRE)
http://numismatics.org/ocre/
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
42http://www.synaptica.com/oasis/
Deep Image Annotation
8/11/16
43Clarke, David. 2015. Deep image annotation and Knowledge Organization. ISKO-UK 2015.
/content/deep-image-annotation-and-knowledge-organization
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
44Clarke, David. 2015. Deep image annotation and Knowledge Organization. ISKO-UK 2015.
/content/deep-image-annotation-and-knowledge-organization
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
IIIF Image API
45See API specifications at: http://iiif.io/technical-details.html
International Image Interoperability Framework
Sanderson, Rob. 2014. Open Repositories 2014: Crowdsourced Transcription via IIIF, slide 9.
API= application programming interface, a set of routines, protocols, and
tools for building software applications.
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
III. The Importance of
Knowledge Organization Systems (KOS)
for Effective Subject Access
Various Types of KOS
1. Eliminating ambiguity
2. Controlling synonyms or
equivalents
3. Making explicit semantic
relationships
between/among concepts
Hierarchical relationships
hierarchical + other
associate
relationships
4. Presenting relationships
between/among concepts
as well as properties of
concepts
Fundamental
KOS Approaches
See full picture at http://nkos.slis.kent.edu/KOS_taxonomy.htm8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
46
Figure. FRBR-LRM Overview of Relationships (draft)
Source: http://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf + revision draft
Dealing with The Problem of Semantic Conflicts
(inconsistencies in terminology and meanings)
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
47
48
Dealing with The Problem of Information Overload
Traditional Filters -- “Filter-out”
• Site (physical or digital) organization and navigation support
• Advanced search functions
• “Umbrella” structures of classification and taxonomy from which to
extend content
• Browsing support—hierarchical structures
Beyond traditional filters -- “Filter-forward”
• Browsing and Filtering to the Front -- Using Faceted Structure
• Connecting Things via Semantic Relations
• Enabling Rediscovery
– Data mining, semantic analysis, machine-learning through expert
feedback, machine reasoning
• LOD KOS Datasets become Knowledge Bases
– obtaining special graphs or datasets for very complicated questions, and
– revealing unknown relationships (e.g., http://vocab.getty.edu/queries#Top-
level_Subjects
• From Machine-readable to Machine-understandable/processable
8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
In the BARTOC registry
KOS registered: 1836
in the Datahub
LOD KOS registered :1251
(about a half are ontologies)
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
49
http://bartoc.org/
https://datahub.io/
(2016.05.27 data)
(2016.03.15 data)
Fact: The Increasing Need for KOS
8/11/16
Initiatives in digital humanities have demonstrated a paradigm shift in
how cultural heritage materials can be
- searched, mined, displayed, taught, and analyzed
utilizing digital technologies.
Data provided by LAMs and cultural heritage institutions are treasures for all
humanities researchers.
When subject access, smart data, and digital humanities interact, the
opportunity of effective and innovative services and contributions can be
endless.
Let’s embrace the new and changing concepts and make these happen.
Conclusion
Thank you!8/11/16
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
50
References
• Gardner, D. 2012. An ocean of data [Introduction]. In: Smolan, R., Erwitt, J. (eds.) The human face of big
data, pp. 14-17. Sausalito, CA: Against All Odds Productions.
• Joshi, Kunal. 2013. Big data, data science & fast data. http://www.slideshare.net/kunaljoshi111/big-data-
data-science-fast-data
• Kobielus, James. 2016. The Evolution of Big Data to Smart Data. Keynote at Smart Data Online 2016.
• Rose, Gillan. 2013. Visual Methodologies, 3rd. Edition. SAGE Publications Ltd.
• Sanderson, Rob. 2014. Open Repositories 2014: Crowdsourced Transcription via IIIF, slide 9.
http://www.slideshare.net/azaroth42/open-repositories-2014-crowdsourced-transcription-via-iiif
• SAS Institute Inc. [2014]. Big Data: What it is and why it matters. http://www.sas.com/big-data/
• Schöch, Christof. 2013. Big? Smart? Clean? Messy? Data in the humanities. Journal for Digital Humanities.
2(3): 2-13. http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/
• Sheth, Amit. 2014. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety
and Velocity using semantics and Semantic Web. Keynote at 30th IEEE International Conference on Data
Engineering (ICDE) 2014.
• Svensson, Patrik. 2010. The landscape of digital humanities. Digital Humanities Quarterly. 4(1)
http://digitalhumanities.org/dhq/vol/4/1/000080/000080.html
• Svensson, Patrik. 2009. Humanities computing as digital humanities. Digital Humanities Quarterly. 3(3)
http://digitalhumanities.org/dhq/vol/3/3/000065/000065.html
• TiECON East. 2014. Data is new oil. http://www.tieconeast.org/2014/big-data-analytics
M Zeng - IFLA Classification & Indexing
Satellite Conference 2016
518/11/16

More Related Content

Zeng marcia ifla-subjectaccesssmartdatadh

  • 1. Subject Access, Smart Data, and Digital Humanities – Finding Unlimited Opportunities through their Intersections Marcia Lei Zeng Kent State University Keynote @ IFLA Classification & Indexing Satellite Conference 2016 August 11-12, Columbus, OH, USA http://marciazeng.slis.kent.edu/
  • 2. Outline • I. Background • II. Subject Access -- Finding the Unlimited Opportunities • III. The Importance of Knowledge Organization Systems (KOS) for Effective Subject Access M Zeng - IFLA Classification & Indexing Satellite Conference 2016 28/11/16
  • 3. I. Background What do I mean… 1) Subject access – in the context of today’s environments 2) Smart data – in the context of Big Data 3) Digital Humanities – in the context of heritage institutions’ data M Zeng - IFLA Classification & Indexing Satellite Conference 2016 38/11/16
  • 4. What is happening around us? • The 2nd generation of the Web: the Semantic Web – Search engines involvement, – mature of the Linked Data technologies, – non-traditional databases • “Big Data” – Government funding opportunities, – Blooming of ‘data analytics’ profession • Modern AI (artificial intelligence) – Machine-learning – Contextual computing • Participatory culture – Social media – Engaging end-users in the workflow M Zeng - IFLA Classification & Indexing Satellite Conference 2016 4 I. Background https://www.w3.org/2002/Talks/www2002-w3ct-swintro-em/slide7-0.html 8/11/16
  • 5. Source: Nova Spivak, Radar Networks; John Breslin, DERI; & Mills Davis, Project10X, 2007, 2008 Copyright MILLS•DAVIS 5 Web 1.0: connecting information and getting on the net. Web 2.0: connecting people — putting the “I” in user interface, and the “we” into Webs of social participation. Web 3.0 Connecting knowledge -- representing meanings, connecting knowledge, and putting these to work in ways that make our experience of internet more relevant, useful, and enjoyable. Web 4.0 Connecting intelligence -- It is about connecting intelligences in a ubiquitous Web where both people and things reason and communicate together. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 6. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 6 Big data • Volume (data quantity) • Velocity (data speed) • Variety (data types & nature) • Variability (data consistency) • Veracity (data quality) • Complexity Source: Kobielus, James. 2016. The Evolution of Big Data to Smart Data. Keynote at Smart Data Online 2016. Source: Big Data. Wikipedia. SAS Institute Inc. [2014]. Big Data: What it is and why it matters. Smart Data = Ability to achieve big insights from such data at any scale, great or small. I. Background 8/11/16
  • 7. Why Smart Data • “However, in its raw form, data is just like crude oil; it needs to be refined and processed in order to generate real value. Data has to be cleaned, transformed, and analyzed to unlock its hidden potential.” (TiECON East. Data is new oil.) • Once tamed through organizing and integrating processes, large volumes of unstructured, semi-structured, and structured data are turned into “smart data” that reflect the research priorities of a particular discipline or field. • Smart data inquiries can then be used to provide comprehensive analyses and generate new products and services. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 7 Sources: Gardner, D, 2012. Prithwis Mukerjee, 2014 Schöch, 2013. TiECON East, 2014. 8/11/16
  • 8. What can we do to avoid asthma episode? 8 Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies. Variety Volume VeracityVelocity Value What risk factors influence asthma control? What is the contribution of each risk factor? semantics WHY Big Data to Smart Data: Asthma example Slide from: Sheth, Amit. 2014. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web. Understanding relationships between health signals and asthma attacks for providing actionable information 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 9. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 9 Data (in humanities) Big data unstructured messy implicit relatively large in volume varied in form Smart data semi- structured or structured Clean Explicit and enriched Raw data + markup, annotations and metadata Relatively small in volume The creation involves human agency & demands time The process of modeling the data is essentialOf limited heterogeneity Complied based on Schöch, Christof. 2013. Big? Smart? Clean? Messy? Data in the humanities. Journal for Digital Humanities. 2(3) What about LAMs? 8/11/16
  • 10. Structured Semi-structured Unstructured 10 • National bibliographies • Catalogs • Special collection portals • Registries • Metadata for datasets • … • Text Encoding Initiative (TEI) files • Finding Aids • Value added/tagged resources • Unstructured portion within metadata descriptions • Digitized materials, textual or non-textual • Original information-bearing objects • Documents in all kinds of formats • … • Data from Web crawling that need to be cleaned • … … LAM data examples “Smart Data” emphasizes the organizing and integrating processes from unstructured data to structured and semi-structured data, to make the big data smarter. - Schöch, 2013 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 11. • the field is still expanding, • the definitions are being debated, and • the multifaceted landscape is yet to be fully understood. • Most agree that initiatives and activities in digital humanities are at the intersection between the humanities and digital information technology. • The field applies big data mathematical research techniques to the description and analysis of cultural objects—including art, literature, and technological artifacts themselves. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 11 Image source: Katherine Hayles http://dtc-wsuv.org/wp/dtc375- scodi/katherine-hayles/ • Svensson, P. 2010. The Landscape of Digital Humanities. Digital Humanities Quarterly. 4(1). • Svensson, P. 2009. Humanities Computing as Digital Humanities. Digital Humanities Quarterly. 3(3) I. Background 8/11/16
  • 12. Advanced technologies now allow researchers : (under the umbrella of Big Data and the Semantic Web) • to access and reuse large volumes of diverse data, • to discover patterns and connections formerly hidden from view, • to reconstruct the past, • to discover impacts in real and virtual environments, and • to bring the complex intricacies of innovations to light, all as never before. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 12 Image source: http://goo.gl/a4gZsd Image source: Schöch, 2013. 8/11/16
  • 13. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 13 Think: • What kind of data did the project use? Data sources: • Freebase (now Wikidata) • Union List of Artist Names (ULAN®) • Allgemeines Künstlerlexikon/ Artists of the World Schich, M. et al. 2014. “A Network Framework of Cultural History.” Science, 345(6196), 558-562. Nature Video. (2014, July 31). Charting culture. https://www.youtube.com/w atch?v=4gIhRkCcD4U 8/11/16
  • 14. Advanced technologies now allow researchers : (under the umbrella of Big Data and the Semantic Web) • to access and reuse large volumes of diverse data, • to discover patterns and connections formerly hidden from view, • to reconstruct the past, • to discover impacts in real and virtual environments, and • to bring the complex intricacies of innovations to light, all as never before. Data provided by LAMs and cultural heritage institutions are treasures for all humanities researchers. Trending: • Machine readable understandable data • Machine readable actionable data • Accurate (no error) data in the processes of interlinking, citing, transferring, rights-permission, use and reuse, etc. • One –to -many uses and high efficiency processing data M Zeng - IFLA Classification & Indexing Satellite Conference 2016 14 http://goo.gl/a4gZsd 8/11/16
  • 15. 15 Digital humanities – Librarian Survey Results, December 2015 http://americanlibrariesmagazine.org/2016/01/04/special-report-digital-humanities-libraries/ Source: http://americanlibrariesmagazine.org/wp-content/uploads/2016/01/digital-humanities-faculty.pdf 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 16. 16 Digital humanities – Faculty Survey Results, December 2015 Source: http://americanlibrariesmagazine.org/wp-content/uploads/2016/01/digital-humanities-faculty.pdf8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 17. Structured Semi-structured Unstructured 17 • National bibliographies • Catalogs • Special collection portals • Registries • Metadata for datasets • … • Text Encoding Initiative (TEI files) • Finding Aids • Value added/tagged resources • Unstructured portion within metadata descriptions • Digitized materials, textual or non-textual • Original information-bearing objects • Documents in all kinds of formats • … • Data from Web crawling that need to be cleaned • … … LAM data examples II. Subject Access -- Finding the Unlimited Opportunities 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 18. Figure. Overview of Relationships (draft) Source: http://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf + revision draft FRBR-Library Reference Model (LRM) - World-wide review version RES:“Any entity in the universe of discourse” 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 18
  • 19. liked, cited, researched, tagged, searched, shared, followed, time spent, … Three Perspectives -- from the creation of the structured data based on Rose, Gillan. 2013. Visual Methodologies, 3rd. Ed. • Index • Markup • Ontology • Knowledge base • Metadata • Descriptive • Administrative • Structural/techni cal M Zeng - IFLA Classification & Indexing Satellite Conference 2016 19 1 2 3 Production Content Audiences’ receiving interests Image sources: http://www.mahalo.com/how-to- understand-perspective-in-drawing/ http://judithlondono.com/ http://www.smrfoundation.org/nodexl/ 8/11/16
  • 20. 20 Read more: Godby, Wang, Mixter. 2015. Library Linked Data in the Cloud – OCLC’s Experiments with New Models of Resource Description. ISBN 9781627052191. Figure 1.1: A bibliographic description as a record and a graph. WorldCat Linked Data https://www.oclc.org/developer/develop/linked- data.en.html OCLC WorldCat Works – 197 Million Nuggets of Linked Data -- Since 2014- “The bibliographic metadata found in WorldCat contains a rich set of objects that can be represented in linked data.” Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 21. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 21 BibFrame - based Read: https://www.denverlibrary.org/blog/rachel-f/dpl-announces-linked-data-launch 2015-06 Try: http://labs.libhub.org/denverpl/ Linking THINGS, not strings. Access through the linked things. 8/11/16
  • 22. 22 http://www.worldcat.org/oclc/922220005 http://worldcat.org/entity/work/id/2534768 http://worldcat.org/entity/person/id/2631227899 Linking out Schema.org- based Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 23. 23 Go to http://dbpedia.org/page/Lois_Mai_Chan and follow knownFor and about Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 24. 24 The connected structured data for THINGs from Perspectives #2 and #3: 2 3 Linking THINGS, not strings. Access through the linked things. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 25. Structured Semi-structured Unstructured 25 • National bibliographies • Catalogs • Special collection portals • Registries • Metadata for datasets • … • Text Encoding Initiative (TEI files) • Finding Aids • Value added/tagged resources • Unstructured portion within metadata descriptions • Digitized materials, textual or non-textual • Original information-bearing objects • Documents in all kinds of formats • … • Data from Web crawling that need to be cleaned • … … LAM data examples There are many hidden access points that can bring in much richer information and knowledge through LAM data. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 26. Big text in humanities “Big text” – The text version of “big data” – Where? • special collections, • archives, • oral histories, • annual reports, • provenance indexes, • inventories, • … etc. – How? • Fact mining, analytics – What is needed? Tools • to ‘mine’ the text, • to manage extracted entities as new access points, and • to connect with the outside data. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 26 Source: SmarkLogic webinar http://www.marklogic.com/w ebinars/ 8/11/16
  • 27. Audiences’ receiving interests liked, cited, researched, tagged, searched, shared, followed, time spent, … • Index • Markup • Ontology • Knowledge base • Metadata • Descriptive • Administrative • Structural/techni cal M Zeng - IFLA Classification & Indexing Satellite Conference 2016 27 1 2 3 Production Content 2). Taking archival finding aids as an example Image source: http://alelemuseum.tripod.com/Archive s.html Image source: https://libraries.u sc.edu/article/ins ide-usc-libraries- grand-avenue- library 8/11/16
  • 28. • Finding aids • Provide detailed descriptions of a collection's component parts, • summarize the overall scope of the content, • convey details about the individuals and organizations involved, • list box and folder headings. • The ‘subject’ access is to the whole archive (=Perspective #1) • Few provided accesses to the ‘things’ contained in the contents through index terms. [Images from a finding aids: title page, content page, and index terms page. Image source: https://libraries.u sc.edu/article/ins ide-usc-libraries- grand-avenue- library8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 28
  • 29. Finding Aids semantic analysis using ontology-based tools by KSU SLIS LOD-LAM team http://lod- lam.slis.kent.edu/SemanticAnalysis.html • 45 archival finding aids • drawn from 16 repositories • From OpenCalais: extracted 8,096 entities and 336 suggested social tags 29 OpenCalais and COGITO are • semantic analysis/fact-mining tools, • taxonomy and ontology-supported, • with machine learning and natural language processing behind.8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 30. after before 30 http://www.opencalais.com/ 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 31. Structured data produced by Calais (RDF/XML) 31 http://www.opencalais.com/ 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 32. before after 32 http://www.intelligenceapi.com 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 33. after after Structured data for THINGs M Zeng - IFLA Classification & Indexing Satellite Conference 2016 33 before http://www.intelligenceapi.com 8/11/16
  • 34. 34 for enhancing access to the contents in oral history transcripts files. • Currently many are managed at collection level only. • Only some have deep indexes, with great quality. • The indexes usually existed as a ‘back-of-the-book’ style and stayed within PDF files, downloadable. • The indexes could be used in the collection's subject searching. • Indexed THINGs could be linked to external resources. The same approach can be used for the oral history collections [image of a back-of-the-book style index to an oral history transcripts] [image of a page of the oral history transcripts] 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 35. 35 Tool used: Open Calais Note: Only for assistant extraction; still need human cleaning process. The same approach can be used for the library catalogs 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 36. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 36 The same approach can be used for the museum object labels and descriptions COGITOOpenCalais http://www.metmuseum.org/toah/works-of-art/2010.312/ 8/11/16
  • 37. AfterBosonBefore http://bosonnlp.com/demo entities keywords 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 37
  • 38. Audiences’ receiving interests liked, cited, researched, tagged, searched, shared, followed, time spent, … • Index • Markup • Ontology • Knowledge base • Metadata • Descriptive • Administrative • Structural/techni cal M Zeng - IFLA Classification & Indexing Satellite Conference 2016 38 1 2 3 Production Content 3) Taking non-textual objects as examples 8/11/16
  • 39. Portrait of Marcus Aurelius Online Coins of the Roman Empire (OCRE) - Ontology based, knowledge base http://numismatics.org/ocre/results • Modeling in an ontology (formed in classes, properties, relationships) • Following Linked Data principles • Using RDF triples for entities • Querying in SPARQL language 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 39
  • 40. 40 Online Coins of the Roman Empire (OCRE) http://numismatics.org/ocre/ • Using sparql queries to find • Output as CSV files • Auto-Visualizing using FusionTable • Just needs a few seconds 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 41. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 41 Online Coins of the Roman Empire (OCRE) http://numismatics.org/ocre/
  • 42. M Zeng - IFLA Classification & Indexing Satellite Conference 2016 42http://www.synaptica.com/oasis/ Deep Image Annotation 8/11/16
  • 43. 43Clarke, David. 2015. Deep image annotation and Knowledge Organization. ISKO-UK 2015. /content/deep-image-annotation-and-knowledge-organization 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 44. 44Clarke, David. 2015. Deep image annotation and Knowledge Organization. ISKO-UK 2015. /content/deep-image-annotation-and-knowledge-organization 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 45. IIIF Image API 45See API specifications at: http://iiif.io/technical-details.html International Image Interoperability Framework Sanderson, Rob. 2014. Open Repositories 2014: Crowdsourced Transcription via IIIF, slide 9. API= application programming interface, a set of routines, protocols, and tools for building software applications. 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 46. III. The Importance of Knowledge Organization Systems (KOS) for Effective Subject Access Various Types of KOS 1. Eliminating ambiguity 2. Controlling synonyms or equivalents 3. Making explicit semantic relationships between/among concepts Hierarchical relationships hierarchical + other associate relationships 4. Presenting relationships between/among concepts as well as properties of concepts Fundamental KOS Approaches See full picture at http://nkos.slis.kent.edu/KOS_taxonomy.htm8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 46
  • 47. Figure. FRBR-LRM Overview of Relationships (draft) Source: http://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf + revision draft Dealing with The Problem of Semantic Conflicts (inconsistencies in terminology and meanings) 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 47
  • 48. 48 Dealing with The Problem of Information Overload Traditional Filters -- “Filter-out” • Site (physical or digital) organization and navigation support • Advanced search functions • “Umbrella” structures of classification and taxonomy from which to extend content • Browsing support—hierarchical structures Beyond traditional filters -- “Filter-forward” • Browsing and Filtering to the Front -- Using Faceted Structure • Connecting Things via Semantic Relations • Enabling Rediscovery – Data mining, semantic analysis, machine-learning through expert feedback, machine reasoning • LOD KOS Datasets become Knowledge Bases – obtaining special graphs or datasets for very complicated questions, and – revealing unknown relationships (e.g., http://vocab.getty.edu/queries#Top- level_Subjects • From Machine-readable to Machine-understandable/processable 8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016
  • 49. In the BARTOC registry KOS registered: 1836 in the Datahub LOD KOS registered :1251 (about a half are ontologies) M Zeng - IFLA Classification & Indexing Satellite Conference 2016 49 http://bartoc.org/ https://datahub.io/ (2016.05.27 data) (2016.03.15 data) Fact: The Increasing Need for KOS 8/11/16
  • 50. Initiatives in digital humanities have demonstrated a paradigm shift in how cultural heritage materials can be - searched, mined, displayed, taught, and analyzed utilizing digital technologies. Data provided by LAMs and cultural heritage institutions are treasures for all humanities researchers. When subject access, smart data, and digital humanities interact, the opportunity of effective and innovative services and contributions can be endless. Let’s embrace the new and changing concepts and make these happen. Conclusion Thank you!8/11/16 M Zeng - IFLA Classification & Indexing Satellite Conference 2016 50
  • 51. References • Gardner, D. 2012. An ocean of data [Introduction]. In: Smolan, R., Erwitt, J. (eds.) The human face of big data, pp. 14-17. Sausalito, CA: Against All Odds Productions. • Joshi, Kunal. 2013. Big data, data science & fast data. http://www.slideshare.net/kunaljoshi111/big-data- data-science-fast-data • Kobielus, James. 2016. The Evolution of Big Data to Smart Data. Keynote at Smart Data Online 2016. • Rose, Gillan. 2013. Visual Methodologies, 3rd. Edition. SAGE Publications Ltd. • Sanderson, Rob. 2014. Open Repositories 2014: Crowdsourced Transcription via IIIF, slide 9. http://www.slideshare.net/azaroth42/open-repositories-2014-crowdsourced-transcription-via-iiif • SAS Institute Inc. [2014]. Big Data: What it is and why it matters. http://www.sas.com/big-data/ • Schöch, Christof. 2013. Big? Smart? Clean? Messy? Data in the humanities. Journal for Digital Humanities. 2(3): 2-13. http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/ • Sheth, Amit. 2014. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web. Keynote at 30th IEEE International Conference on Data Engineering (ICDE) 2014. • Svensson, Patrik. 2010. The landscape of digital humanities. Digital Humanities Quarterly. 4(1) http://digitalhumanities.org/dhq/vol/4/1/000080/000080.html • Svensson, Patrik. 2009. Humanities computing as digital humanities. Digital Humanities Quarterly. 3(3) http://digitalhumanities.org/dhq/vol/3/3/000065/000065.html • TiECON East. 2014. Data is new oil. http://www.tieconeast.org/2014/big-data-analytics M Zeng - IFLA Classification & Indexing Satellite Conference 2016 518/11/16