Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/585058.585081acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

Towards a semantics for XML markup

Published: 08 November 2002 Publication History

Abstract

Although XML Document Type Definitions provide a mechanism for specifying, in machine-readable form, the syntax of an XML markup language, there is no comparable mechanism for specifying the semantics of an XML vocabulary. That is, there is no way to characterize the meaning of XML markup so that the facts and relationships represented by the occurrence of XML constructs can be explicitly, comprehensively, and mechanically identified. This has serious practical and theoretical consequences. On the positive side, XML constructs can be assigned arbitrary semantics and used in application areas not foreseen by the original designers. On the less positive side, both content developers and application engineers must rely upon prose documentation, or, worse, conjectures about the intention of the markup language designer --- a process that is time-consuming, error-prone, incomplete, and unverifiable, even when the language designer properly documents the language. In addition, the lack of a substantial body of research in markup semantics means that digital document processing is undertheorized as an engineering application area. Although there are some related projects underway (XML Schema, RDF, the Semantic Web) which provide relevant results, none of these projects directly and comprehensively address the core problems of XML markup semantics. This paper (i) summarizes the history of the concept of markup meaning, (ii) characterizes the specific problems that motivate the need for a formal semantics for XML and (iii) describes an ongoing research project --- the BECHAMEL Markup Semantics Project --- that is attempting to develop such a semantics.

References

[1]
AAP. Author's Guide to Electronic Manuscript Preparation and Markup. Electronic Manuscript Series. Association of American Publishers, Washington, DC, 1986. Current version: ANSI/NISO/ISO 12083 - 1995 Electronic Manuscript Preparation and Markup, National Information Standards Organization, 1995.
[2]
Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web. Scientific American 284, 5 (May 2001), 35--43.
[3]
Cole, T., and Kazmer, M. SGML as a component of the digital library. Library High Tech 13, 4 (1995), 75--90.
[4]
Coombs, J. H., Renear, A. H., and DeRose, S. J. Markup systems and the future of scholarly text processing. Communications of the Association for Computing Machinery 30, 11 (1987), 933--947.
[5]
DeRose, S. J., Durand, D., Mylonas, E., and Renear, A. H. What is text, really? Journal of Computing in Higher Education 1, 2 (1990), 3--26.
[6]
Dubin, D., Renear, A., Sperberg-McQueen, C. M., and Huitfeldt, C. A logic programming environment for document semantics and inference. Presented at ALLC/ACH, Tübingen, Germany, July 2002.
[7]
Ensign, C. SGML: The Billion Dollar Secret. Prentice Hall, Upper Saddle River, NJ, 1997, ch. 5: United Technologies Sikorsky Aircraft Corporation.
[8]
Fausey, J., and Shafer, K. All my data is in SGML. Now what? Journal of the American Society for Information Science 48, 7 (1997), 638--643.
[9]
Fay, C. The document management alliance. Bulletin of the American Society for Information Science 25, 1 (October/November 1998), 20--24.
[10]
Goldfarb, C. F. Document Composition Facility: Generalized Markup Language (GML) Users Guide. IBM General Products Division, 1978. SH20-9160-0.
[11]
Goldfarb, C. F. A generalized approach to document markup. In Proceedings of the ACM SIGPLAN-SIGOA Symposium on Text Manipulation (New York, 1981), ACM, pp. 68--73.
[12]
IBM Corp. Application Description, IBM System/360 Document Processing: System. White Plains, NY, 1967. Form No. H20-0315.
[13]
Ide, N. M., and Sperberg-McQueen, C. M. Toward a unified docuverse: Standardizing document markup and access without procrustean bargains. In Proceedings of the 60th Annual Meeting of the American Society for Information Science (Medford, NJ, 1997), C. Schwartz and M. Rorvig, Eds., Information Today, Inc., pp. 347--360.
[14]
ISO. ISO 8879-1986 (E). Information processing --- Text and Office Systems --- Standard Generalized Markup Language (SGML). International Organization for Standardization, Geneva, 1986.
[15]
ISO. ISO/IEC 10744:1997: Information processing -- Hypermedia/Time-based Structuring Language (HyTime), second~ed. International Organization for Standardization, Geneva, May 1997, appendix A.3 Architectural Form Definition Requirements.
[16]
ISO. ISO/IEC 13250: 2000 Information technology -- SGML Applications -- Topic Maps. International Organization for Standardization, Geneva, 2000.
[17]
Knuth, D. E. TeX and Metafont: New Directions in Typesetting. Digital Press, Bedford, MA, 1979.
[18]
Lamport, L. LaTeX -- A document preparation system. Addison-Wesley, Reading, MA, 1985.
[19]
Lesk, M. E. Typing Documents on UNIX and GCOS: The -ms Macros for Troff, 1977.
[20]
Mamrak, S. A., Barnes, J., Hong, H., Joseph, C., Kaelbling, M., Nicholas, C., O'Connell, C., and Share, M. Descriptive markup -- the best approach? Communications of the Association for Computing Machinery 31, 7 (1988), 810--811.
[21]
Mamrak, S. A., Kaelbling, M. J., Nicholas, C. K., and Share, M. A software architecture for supporting the exchange of electronic manuscripts. Communications of the ACM 30, 5 (1987), 408--414.
[22]
Ossanna, J. F. NROFF/TROFF user's manual. Tech. Rep. 54, Bell Laboratories, Murray Hill, NJ, October 1976.
[23]
Ramalho, J. C., and Henriques, P. R. Beyond DTDs: constraining data content. In Proceedings of SGML/XML Europe 98 (Paris, May 1998), GCA.
[24]
Raymond, D. R., and Tompa, F. W. Markup reconsidered. Technical Report 356, Department of Computer Science, The University of Western Ontario, 1993. Presented at the First International Workshop on the Principles of Document Processing, Washinton DC, October 21-23 1992; an earlier version was circulated privately as "Markup Considered Harmful" in the late 1980s.
[25]
Raymond, D. R., Tompa, F. W., and Wood, D. From data representation to data model: Meta-semantic issues in the evolution of sgml. Computer Standards and Interfaces 18, 1 (January 1996), 25--36.
[26]
Reid, B. K. Scribe Introductory User's Manual, first ed. Carnegie-Mellon University, Computer Science Department, Pittsburgh, PA, August 1978.
[27]
Reid, B. K. Scribe: A Document Specification Language and its Compiler. PhD thesis, Carnegie-Mellon University, Pittsburgh, PA, 1981. Also available as Technical Report CMU-CS-81-100.
[28]
Renear, A. The descriptive/procedural distinction is flawed. Markup Languages: Theory and Practice 2, 4 (2000), 411--420.
[29]
Renear, A. Raising the bar: Text encoding from a logical point of view. CLIP 2001: Computers, Literature, Philology, Gerhard-Mercator University, Duisburg, Germany, December 2001.
[30]
Rizzi, R. Complexity of context-free grammars with exceptions and the inadequacy of grammars as models for XML and SGML. Markup Languages: Theory and Practice 3, 1 (2002), 107--116.
[31]
Rowe, N. C. Artificial Intelligence through Prolog. Prentice Hall, Englewood Cliffs, NJ, 1988.
[32]
Schatz, B., Mischo, W. H., Cole, T. W., Hardin, J. B., Bishop, A. P., and Chen, H. Federating diverse collections of scientific literature. Computer 29 (May 1996), 28--36.
[33]
Shobowale, G. SGML, XML, and the document-centered approach to electronic medical records. Bulletin of the American Society for Information Science 25, 1 (October/November 1998), 7--10.
[34]
Simons, G. F. Using architectural forms to map TEI data into an object-oriented database. Computers and the Humanities 33, 1--2 (1999), 85--101. Originally delivered in 1997 at the TEI 10 conference in Providence, RI.
[35]
Sperberg-McQueen, C. M., Dubin, D., Huitfeldt, C., and Renear, A. Drawing inferences on the basis of markup. In Proceedings of Extreme Markup Languages 2002 (Montreal, Canada, August 2002), B. T. Usdin and S. R. Newcomb, Eds.
[36]
Sperberg-McQueen, C. M., Huitfeldt, C., and Renear, A. Meaning and interpretation of markup. Markup Languages: Theory and Practice 2, 3 (2000), 215--234.
[37]
Sperberg-McQueen, C. M., Huitfeldt, C., and Renear, A. Practical extraction of meaning from markup. Paper delivered at ACH/ALLC 2001, New York, 2001.
[38]
Sperberg-McQueen, C. M., Renear, A., Huitfeldt, C., and Dubin, D. Skeletons in the closet: Saying what markup means. Presented at ALLC/ACH, Tübingen, Germany, July 2002.
[39]
Sperberg-McQueen, M., and Burnard, L., Eds. Guidelines for Text Encoding and Interchange (TEI P3). ACH/ALLC/ACL Text Encoding Initiative, Chicago, Oxford, 1994.
[40]
Spring, M. B. The origin and use of copymarks in electronic publishing. Journal of Documentation 45, 2 (June 1989), 110--123.
[41]
Tanimoto, S. L. The Elements of Artificial Intelligence. Computer Science Press, Rockville, MD, 1987.
[42]
United States Department of Defense. MIL-M-28001 Military Specification: Markup Requirements and Generic Style Specification for Electronic Printed Output and Exchange of Text, 1988.
[43]
Welty, C., and Ide, N. Using the right tools: Enhancing retrieval from marked-up documents. Computers and the Humanities 33, 1--2 (1999), 59--84. Originally delivered in 1997 at the TEI 10 conference in Providence, RI.

Cited By

View all
  • (2016)Discerning the Intellectual Focus of AnnotationsProceedings of Balisage: The Markup Conference 201610.4242/BalisageVol17.Jett01Online publication date: 2016
  • (2016)Formal Ontologies, Linked Data, and TEI SemanticsJournal of the Text Encoding Initiative10.4000/jtei.1480Online publication date: 8-Sep-2016
  • (2016)Text Encoding Initiative Semantic Modeling. A Conceptual Workflow ProposalDigital Libraries on the Move10.1007/978-3-319-41938-1_5(48-60)Online publication date: 1-Jul-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '02: Proceedings of the 2002 ACM symposium on Document engineering
November 2002
168 pages
ISBN:1581135947
DOI:10.1145/585058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SGML
  2. XML
  3. knowledge representation
  4. markup
  5. semantics

Qualifiers

  • Article

Conference

DocEng02

Acceptance Rates

DocEng '02 Paper Acceptance Rate 21 of 46 submissions, 46%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Discerning the Intellectual Focus of AnnotationsProceedings of Balisage: The Markup Conference 201610.4242/BalisageVol17.Jett01Online publication date: 2016
  • (2016)Formal Ontologies, Linked Data, and TEI SemanticsJournal of the Text Encoding Initiative10.4000/jtei.1480Online publication date: 8-Sep-2016
  • (2016)Text Encoding Initiative Semantic Modeling. A Conceptual Workflow ProposalDigital Libraries on the Move10.1007/978-3-319-41938-1_5(48-60)Online publication date: 1-Jul-2016
  • (2014)The Linked Fragment: TEI and the Encoding of Text Reuses of Lost AuthorsJournal of the Text Encoding Initiative10.4000/jtei.1218Online publication date: 28-Dec-2014
  • (2014)The Digital Publishing RevolutionSemantic Web Technologies and Legal Scholarly Publishing10.1007/978-3-319-04777-5_2(7-43)Online publication date: 21-Jun-2014
  • (2011)Using semantic web technologies for analysis and validation of structural markupInternational Journal of Web Engineering and Technology10.5555/2071294.20712996:4(375-398)Online publication date: 1-Oct-2011
  • (2011)Implications of Markup on the Description of Software PatternsKnowledge Engineering for Software Development Life Cycles10.4018/978-1-60960-509-4.ch008(136-160)Online publication date: 2011
  • (2011)Dealing with markup semanticsProceedings of the 7th International Conference on Semantic Systems10.1145/2063518.2063533(111-118)Online publication date: 7-Sep-2011
  • (2010)A formal approach to XML semantics: implications for archive standardsProceedings of the International Symposium on XML for the Long Haul: Issues in the Long-term Preservation of XML10.4242/BalisageVol6.Dombrowski01Online publication date: 2010
  • (2010)Discourse situations and markup interoperabilityProceedings of Balisage: The Markup Conference 201010.4242/BalisageVol5.Wickett01Online publication date: 2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media