Inferring XML Schema Definitions From XML Data

The document proposes algorithms to automatically generate XML schemas (XSDs) from XML data. It introduces the problem of inferring schemas that can depend on element context. The solution presented is an iLocal algorithm that learns single occurrence automata from the data and transforms them into single occurrence regular expressions that form the inferred schema. A Reduce algorithm is also proposed to minimize the number of types in the generated schema. Experimental evaluation is discussed to test the accuracy of the inferred schemas.

Uploaded by

Dana Pistol

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Inferring XML Schema Definitions From XML Data

Uploaded by

Dana Pistol

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Inferring XML Schema Definitions from XML Data

Geert Jan Bex

Frank Neven
Stijn Vansummeren
How can we automatically
Paper presentation generate schemas
from XML documents?
Overview Overview

●Problem
●Solution
●Related work
●Background
●Contributions
➔ iLocal algorithm
➔ Reduce algorithm
➔ iXSD
●Experimental evaluation
Problem
XML DTD
<library>
<borrowed>
<person>
<name/><tel/><email/> <!ELEMENT library (borrowed*,stock+)>
</person>
<!ELEMENT borrowed (person,book+)>
<book>
<id/> <author/> <time/> <!ELEMENT stock (book)+>
</book> <!ELEMENT person (name,tel+,email?)>
</borrowed> <!ELEMENT book (id,author,nbBooks?,
<stock> (bookshelf|time)?)>
<book>
<id/> <author/>
<nbBooks/> <bookshelf/>
</book>
</stock>
</library>
Solution
<library> XSD
<borrowed>
<person>
<name/><tel/><email/> root -> library[library]
</person> library -> borrowed[borrowed]*, stock[stock]
<book> borrowed -> person[person], book[book1]+
<id/> <author/> <time/> stock -> book[book2]+
</book> person -> name[emp], tel[emp]?, email[emp]+
</borrowed>
<stock> book1 -> id[emp], author[emp], time[emp]
<book> book2 -> id[emp], author[emp], nbBooks[emp], bookshelf[emp]
<id/> <author/> <nbBooks/> emp -> #PCDATA
<bookshelf/>
</book>
</stock>
</library>
Related Work

● Schema inference(SSD)
○ Restricting algorithms to trees
—> XSD schemas can’t
○ No order considered between the children of a node be derived
● DTD inference
● XSD inference
○ Trang The expressiveness of the generated schema
○ Xstruct —> does not go beyond that of a DTD.
● Learning of tree automata
○ Inferring queries, not XSD
Background
Considering an XML,

● an XML fragment is a sequence of elements <a1>f</a1> …<an>fn</an>, where a are

element names and f are XML fragments.
f = <library>
<borrowed>
<person><name/><tel/><email/> </person>
<book> <id/> <author/> <time/> </book>
</borrowed>
</library>
● Paths(f) is the set of all labeled paths starting at a root element in the XML fragment f.
Paths(f) = { λ, library, library borrowed, library borrowed book, library borrowed
person, library borrowed book id, etc. }
● Strings(f,p) is the set of all strings of element names occurring below an occurrence of
path p in fragment f.
Strings(f, library borrowed) = { person book }
Background
Definition 1:
An XSD is a triple D = (T, ρ, τ) , consisting of a finite set of types T, a mapping ρ from T to
regular expressions and τ that assigns a type to each pair (t,a) with the element name a
occurring in ρ(t).
T = { root, library, borrowed, stock, person,book1, book2, emp}
ρ(root) = library τ(root, library) = library
ρ(library) = borrowed*, stock τ(library, borrowed) = borrowed
ρ(person) = name, tel?, email+ τ(library, stock) = stock

● W3C specification requires regular expression to be deterministic

● an XSD is k-local if its content models depend only on labels up to the k-th ancestor.
Background
Definition 2 :
SORE: A regular expression r is single occurrence if every element name occurs at
most once in it. An XSD is single occurrence if it contains only SOREs.(SOXSD)

borrowed, stock borrowed, stock, stock

Definition 3 :
SOA is a graph A = (V,E) where all states in V-{in,out} are element names, and
E ⊆ (V-{in}) x (V-{out}) is the edge relation.

● L(A) is the set of all strings accepted by A.

Contribution
The goal is to infer a k-local single occurrence XSD (D’, t’) equivalent to a target k-local
SOXSD (D, t) given only a finite corpus of XML documents.
● Let C be a corpus consisting of 2 XML fragments which are valid wrt XSD presented
before and k a natural number

iLocal Algorithm:
➔ T = { set of types consist of all (p/k) / p ∈ paths(C) }
➔ ρ ← Ø ; τ ← Ø;
➔ construct the content model for these types:
◆ learn the SOA for the set k-strings(C, (p/k)) of all strings occurring in C below a
path q that is k-equivalent to the type pk
◆ transform this SOA into SORE
◆ add each transition from pk to sore to the ρ
➔ for each path pa in paths(C), add(p/k,a)->(pa)/k to τ
Let C be the corpora of these two XMLs <library>
<borrowed>
<library> <person>
<borrowed> <name/><email/>
<person> </person>
<name/><tel/><email/> <book>
</person> <id/> <author/> <time/>
<book> </book>
<id/> <author/> <time/> </borrowed>
</book> <borrowed> ,,, </borrowed>
</borrowed> <stock>
<stock> <book>
<book> <id/> <author/> <nbBooks/>
<id/> <author/> <nbBooks/> <bookshelf/>
<bookshelf/> </book>
</book> <book>
</stock> <id/> <author/> <nbBooks/>
</library> <bookshelf/>
<book/>
</book>
</stock>
</library>
Running the iLocal:
k=2

p/k = borrowed book

k-strings(C,p/k) = {id name nbBooks author bookshelf,

id name book book}

SOA exemple

SORE exemple

do not contain timeBorrowed element

we have inferred the content models for all types exemple

determine the type associated with the element names in these content models, for k = 2 exemple

final result exemple

problem next, minimisation // more types than necessary

Minimise:

probem so created they own algo reduce

Reduce of iLocal:
Experimental Evaluation
Personal Opinion
Conclusion
❖ Problem: Document Type Definition for inferring XML it is not enough, the content
model of an element can only depend on the element name and not on the context in
which is used.
❖ Solution: inferring XML Schema Definition. XSD allow the content model of an
element to depend on the context in which is used.
❖ Background:
➢ Definition of XSD
➢ SORE
➢ SOA
❖ Contribution
➢ iLocal
➢ Reduce
➢ iXSD = iLocal + Reduce
❖ Experimental Evaluation

Web Programming Unit Wise Questions Unit-1
No ratings yet
Web Programming Unit Wise Questions Unit-1
5 pages
XML Schema 2
No ratings yet
XML Schema 2
64 pages
Querying The Schema's Using Xspath in XML Language: T. Vamsi Vardhan Reddy, D.V. Subbaiah. M.Tech, (PH.D)
No ratings yet
Querying The Schema's Using Xspath in XML Language: T. Vamsi Vardhan Reddy, D.V. Subbaiah. M.Tech, (PH.D)
5 pages
SOA Ans Bank For MID
No ratings yet
SOA Ans Bank For MID
15 pages
What Is An XML Schema?
No ratings yet
What Is An XML Schema?
3 pages
Unit 5 XML
No ratings yet
Unit 5 XML
73 pages
Chapter 11
No ratings yet
Chapter 11
73 pages
Notes (1)
No ratings yet
Notes (1)
36 pages
What Is An XML Schema?
No ratings yet
What Is An XML Schema?
12 pages
Siam6 PDF
No ratings yet
Siam6 PDF
47 pages
SOA XML questions
No ratings yet
SOA XML questions
51 pages
17e2 PDF
No ratings yet
17e2 PDF
57 pages
xmlschema
No ratings yet
xmlschema
22 pages
Ijwsc 040202
No ratings yet
Ijwsc 040202
6 pages
XML Schema
No ratings yet
XML Schema
58 pages
Xmlschema
No ratings yet
Xmlschema
22 pages
IWT unit-IV
No ratings yet
IWT unit-IV
10 pages
XML XSD
No ratings yet
XML XSD
6 pages
WDM Typing
No ratings yet
WDM Typing
24 pages
Lecture 09
No ratings yet
Lecture 09
110 pages
It 6801 Soa QB
No ratings yet
It 6801 Soa QB
55 pages
XML Schema Datatypes in RDF and OWL
No ratings yet
XML Schema Datatypes in RDF and OWL
10 pages
Schema Tutorial
No ratings yet
Schema Tutorial
32 pages
0432 XML DTD and XML Schema
No ratings yet
0432 XML DTD and XML Schema
32 pages
Soa MCQ
No ratings yet
Soa MCQ
34 pages
XML Semantics: A Tree !
No ratings yet
XML Semantics: A Tree !
7 pages
CM3010
No ratings yet
CM3010
7 pages
Unit 4 STUDY MATERIALS
No ratings yet
Unit 4 STUDY MATERIALS
8 pages
XML Technologies and Applications: Rajshekhar Sunderraman
No ratings yet
XML Technologies and Applications: Rajshekhar Sunderraman
24 pages
XML Basics PDF
No ratings yet
XML Basics PDF
75 pages
Extensible Markup Language
No ratings yet
Extensible Markup Language
74 pages
Unit 1
No ratings yet
Unit 1
16 pages
XML - DTD & Schema
No ratings yet
XML - DTD & Schema
200 pages
XML Schema
100% (1)
XML Schema
60 pages
SXML-paper
No ratings yet
SXML-paper
10 pages
IDBE Lectures 12 - XML
No ratings yet
IDBE Lectures 12 - XML
30 pages
XML Schema
No ratings yet
XML Schema
62 pages
Introduction To XML Extensible Markup Language: Prof.N.Nalini AP (SR) VIT
No ratings yet
Introduction To XML Extensible Markup Language: Prof.N.Nalini AP (SR) VIT
35 pages
ADB - Tutorial 7 - XML
No ratings yet
ADB - Tutorial 7 - XML
2 pages
Addison Wesley - XML Schema Complete Reference
No ratings yet
Addison Wesley - XML Schema Complete Reference
954 pages
XML Update
No ratings yet
XML Update
10 pages
Unit 1 - Adbms - 2
No ratings yet
Unit 1 - Adbms - 2
21 pages
Lecture 5 - Semi-Structured Data
No ratings yet
Lecture 5 - Semi-Structured Data
26 pages
DB Unit-3
No ratings yet
DB Unit-3
18 pages
Chapter 11: XML: Data Integration
No ratings yet
Chapter 11: XML: Data Integration
73 pages
Group 7 Databases On The Web and Semi Structured Databases
No ratings yet
Group 7 Databases On The Web and Semi Structured Databases
33 pages
Web Unit 2 (Nep)
No ratings yet
Web Unit 2 (Nep)
45 pages
Unit-1 XML To RWD
No ratings yet
Unit-1 XML To RWD
103 pages
XML
No ratings yet
XML
27 pages
Xml schema
No ratings yet
Xml schema
3 pages
Unit-Iv XML and Datawarehouse
No ratings yet
Unit-Iv XML and Datawarehouse
59 pages
Soa
No ratings yet
Soa
35 pages
Web Data: XML
No ratings yet
Web Data: XML
13 pages
XCAP Tutorial: Jonathan Rosenberg
No ratings yet
XCAP Tutorial: Jonathan Rosenberg
68 pages
Module#5 XML Prof. Ashish Revar
No ratings yet
Module#5 XML Prof. Ashish Revar
46 pages
E Tensible Arkup Anguage Unit-3: Basic XML DTD XML Schema Dom Vs Sax Presenting XML
No ratings yet
E Tensible Arkup Anguage Unit-3: Basic XML DTD XML Schema Dom Vs Sax Presenting XML
39 pages
Testing Web Services
No ratings yet
Testing Web Services
16 pages
XML DTD Xmlschemas XSLT Json Dom
No ratings yet
XML DTD Xmlschemas XSLT Json Dom
68 pages
XML Notes
No ratings yet
XML Notes
5 pages
Ian Talks JS A-Z: WebDevAtoZ, #1
From Everand
Ian Talks JS A-Z: WebDevAtoZ, #1
Ian Eress
No ratings yet
Lone Wolf
From Everand
Lone Wolf
Robin Mason
No ratings yet
CS202 CURRENT PAPER FINAL TERM NOTES(1)
No ratings yet
CS202 CURRENT PAPER FINAL TERM NOTES(1)
6 pages
HTML Notes
No ratings yet
HTML Notes
15 pages
Unit 4 WT Question Bank
No ratings yet
Unit 4 WT Question Bank
4 pages
Web & Internet Technologies Question Bank: Dr. P. Chitralingappa Page 1 of 6
No ratings yet
Web & Internet Technologies Question Bank: Dr. P. Chitralingappa Page 1 of 6
6 pages
Log
No ratings yet
Log
590 pages
Font-Family Color: Tahoma
No ratings yet
Font-Family Color: Tahoma
9 pages
Message
No ratings yet
Message
2 pages
Bug EDUKASI WORK
No ratings yet
Bug EDUKASI WORK
16 pages
Client-Side Web Development
No ratings yet
Client-Side Web Development
19 pages
EEC R DrugApplicationRegistrationDetails
No ratings yet
EEC R DrugApplicationRegistrationDetails
1 page
2.2 Day12-Locators PDF
No ratings yet
2.2 Day12-Locators PDF
27 pages
GC 2024 10 21
No ratings yet
GC 2024 10 21
3 pages
Python Selenium With Pytest
No ratings yet
Python Selenium With Pytest
8 pages
1 - XML 2020 Lab.01 - XML Standard (v3.0)
No ratings yet
1 - XML 2020 Lab.01 - XML Standard (v3.0)
4 pages
Web Services Book
No ratings yet
Web Services Book
2 pages
(Ebook) Build an HTML5 Game: A Developer's Guide with CSS and JavaScript by Karl Bunyan ISBN 9781593275754, 1593275757download
100% (4)
(Ebook) Build an HTML5 Game: A Developer's Guide with CSS and JavaScript by Karl Bunyan ISBN 9781593275754, 1593275757download
46 pages
Unit-2 IWT Internet and Web Technology
No ratings yet
Unit-2 IWT Internet and Web Technology
18 pages
Dom Node List
No ratings yet
Dom Node List
19 pages
Me
No ratings yet
Me
28 pages
Pago Occidente - Occidente
No ratings yet
Pago Occidente - Occidente
7 pages
Get HTML5 and CSS Complete Seventh Edition Gary B Shelly PDF ebook with Full Chapters Now
No ratings yet
Get HTML5 and CSS Complete Seventh Edition Gary B Shelly PDF ebook with Full Chapters Now
45 pages
CAD101 Introduction To Web Development With HTML 5 Css3 and JavaScript
No ratings yet
CAD101 Introduction To Web Development With HTML 5 Css3 and JavaScript
1 page
AfterAcitivityGroup1 CPE43S2
No ratings yet
AfterAcitivityGroup1 CPE43S2
8 pages
MP 3
No ratings yet
MP 3
72 pages
XML Schema
No ratings yet
XML Schema
30 pages
XML Prev
0% (1)
XML Prev
3 pages
Untitled
No ratings yet
Untitled
1,984 pages
A Brief History of HTML: 1993 - Present
No ratings yet
A Brief History of HTML: 1993 - Present
2 pages