Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1031453.1031460acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Parsing concurrent XML

Published: 12 November 2004 Publication History

Abstract

Concurrent markup hierarchies appear often in document-centric XML documents, as a result of different XML elements having overlapping scopes. They require significantly different approach to management and maintenance. Management of XML documents composed of concurrent markup has been mostly studied by the document processing community and has attracted attention of computer scientists only recently. In this paper we discuss the architecture of an XML parser for concurrent XML. This parser uses a GODDAG data structure in place of traditional DOM Tree to store concurrent markup on top of the document content and provides a DOM-like API that allows software developers of tools working with concurrent XML documents to use it instead of parsing each individual component with a traditional DOM XML parser. The paper describes the architecture of the parser, data structures and algorithms used and the DOM-like API.

References

[1]
Xerces2 Java Parser 2.6.2. http://xml.apache.org/xerces2-j/, 2004. Apache XML project.
[2]
Alfred. Boethius. British Library MS Cotton Otho A. vi. Manuscript, folio 36v.
[3]
T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler(Eds.). Extensible Markup Language (XML) 1.0 (Second Edition). http://www.w3.org/TR/REC-xml, Oct 2000. W3C, REC-xml-20001006.
[4]
M. Champion, S. Byrne, G. Nicol, and L. Wood(Eds.). Document Object Model (DOM) Level 1 Specification. http://www.w3.org/TR/REC-DOM-Level-1/, Oct 1998. World Wide Web Consortium Recommendation, REC-DOM-Level-1-19981001.
[5]
A. Dekhtyar, I. Iacob, J. Jarmczyk, K. Kiernan, N. Moore, and D. Porter. Database support for image-based Electronic Editions. In Proc. Workshop on Multimedia Information Systems (MIS'04), pages 147--156, 2004.
[6]
A. Dekhtyar and I. E. Iacob. A Framework for Management of Concurrent XML Markup. In International Workshop on XML Schema and Data Management (XSDM'03), pages 311--322. LNCS, 2003.
[7]
A. Dekhtyar and I. E. Iacob. A Framework for Management of Concurrent XML Markup. Data and Knowledge Engineering, 2004. accepted.
[8]
P. Durusau and M. O'Donnel. Declaring Trees: The Future of the Evolution of Markup? In Proc. Conference on Extreme Markup Languages, 2002.
[9]
P. Durusau and M. B. O'Donnell. Concurrent Markup for XML Documents. In Proc. XML Europe, May 2002.
[10]
K. Hawley and K. Kiernan. An image-based electronic edition of alfred the great's old english version of boethius's consolation of philosophy. In Proc., Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities (ALLC/ACH), pages 91--96, 2003.
[11]
I. E. Iacob, A. Dekhtyar, and W. Zhao. XPath Extension for Querying Concurrent XML Markup. Technical Report TR 394-04, University of Kentucky, Department of Computer Science, February 2004. http://www.cs.uky.edu/~dekhtyar/publications/ TR394-04.ps.
[12]
K. Kiernan, J. Jaromczyk, A. Dekhtyar, D. Porter, K. Hawley, S. Bodapati, and I. Iacob. The ARCHway project: Architecture for research in computing for humanities through research, teaching, and learning. Literary and Linguistic Computing, 2004. forthcoming.
[13]
A. Renear, E. Mylonas, and D. Durand. Refining our notion of what text really is: The problem of overlapping hierarchies. Research in Humanities Computing, 1993. N. Ide and S. Hockey, (Eds.).
[14]
C. M. Sperberg-McQueen and L. Burnard(Eds.). Guidelines for Text Encoding and Interchange (P4). http://www.tei-c.org/P4X/index.html, 2001. The TEI Consortium.
[15]
C. M. Sperberg-McQueen and C. Huitfeldt. GODDAG: A Data Structure for Overlapping Hierarchies, Sept. 2000. Early draft presented at the ACH-ALLC Conference in Charlottesville, June 1999.

Cited By

View all
  • (2006)Remote NeighborhoodTENCON 2006 - 2006 IEEE Region 10 Conference10.1109/TENCON.2006.343870(1-4)Online publication date: Nov-2006
  • (2006)Support for XML markup of image-based electronic editionsInternational Journal on Digital Libraries10.1007/s00799-005-0123-26:1(55-69)Online publication date: 1-Feb-2006
  • (2005)A framework for processing complex document-centric XML with overlapping structuresProceedings of the 2005 ACM SIGMOD international conference on Management of data10.1145/1066157.1066280(897-899)Online publication date: 14-Jun-2005
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WIDM '04: Proceedings of the 6th annual ACM international workshop on Web information and data management
November 2004
168 pages
ISBN:1581139780
DOI:10.1145/1031453
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DOM
  2. GODDAG
  3. concurrent XML
  4. overlapping markup

Qualifiers

  • Article

Conference

CIKM04
Sponsor:
CIKM04: Conference on Information and Knowledge Management
November 12 - 13, 2004
Washington DC, USA

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2006)Remote NeighborhoodTENCON 2006 - 2006 IEEE Region 10 Conference10.1109/TENCON.2006.343870(1-4)Online publication date: Nov-2006
  • (2006)Support for XML markup of image-based electronic editionsInternational Journal on Digital Libraries10.1007/s00799-005-0123-26:1(55-69)Online publication date: 1-Feb-2006
  • (2005)A framework for processing complex document-centric XML with overlapping structuresProceedings of the 2005 ACM SIGMOD international conference on Management of data10.1145/1066157.1066280(897-899)Online publication date: 14-Jun-2005
  • (2005)Processing XML documents with overlapping hierarchiesProceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries10.1145/1065385.1065513(409-409)Online publication date: 7-Jun-2005

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media