Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/986537.986630acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
Article

SA_MetaMatch: relevant document discovery through document metadata and indexing

Published: 02 April 2004 Publication History

Abstract

SA_MetaMatch, a component of the Standards Advisor (SA), is designed to find relevant documents through matching indices of metadata and document content. The elements in the metadata schema are mainly adopted from the Dublin Core (DC). The implementation of the XML metadata schema and coding follows the DC recommended guidelines. After metadata is generated manually for an unstructured document, or is extracted automatically from documents of well defined layout, they are stored in metadata files or in a repository. The indices of the descriptive metadata elements and that of the document content are generated. They are searched and compared to find related documents, based on our observation that if the metadata and high frequency index words of document content are related, then the corresponding documents are likely to be related as well. A ranked list of possible relevant documents is returned as the result. Several matching algorithms have been explored. We selected a sum of word-scored approach which not only gives relevant scores for the matched documents, but also gives an individual score for each of the matching words which provide hints for domain experts to grasp the concepts in the documents.

References

[1]
DCMI. Dublin Core Metadata Element Set, Version 1.1: Reference Description, 2 June 2003. http://dublincore.org/documents/dces/
[2]
DCMI. Dublin Core Metadata Initiative. http://dublincore.org/
[3]
DCMI Usage Board. DCMI Metadata Terms, 4 March 2003. http://www.dublincore.org/documents/dcmi-terms/
[4]
Gill, P., Vaughan W., and Garcia D. Lessons Learned and Technical Standards: A Logical Marriage. ASTM (American Society for Testing and Materials) Standardization News, November 2001. http://standards.msfc.nasa.gov/LL_TechStdsLogicalMarriage.pdf
[5]
Hawker, J. S. The Standards Advisor Project, University of Alabama. http://www.cs.ua.edu/sel/standardsadvisor/
[6]
Miller, G., Fellbaum, C., Tengi, R. WordNet: A Lexical Database for the English Language. http://www.cogsci.princeton.edu/~wn/
[7]
Morgan, E. L. Comparing Open Source Indexers. Infomotions, Inc., 28 May 2001. http://www.infomotions.com/musings/opensource-indexers/
[8]
NASA. NASA Technical Standards Program. http://standards.nasa.gov/
[9]
NASA Scientific and Technical Information (STI) Program. NASA Thesaurus Machine Aided Indexing. http://mai.larc.nasa.gov
[10]
Nordic Metadata Project. Dublin Core Metadata Creator. http://www.lub.lu.se/cgi-bin/nmdc.pl
[11]
Powell A., Johnston, P. DCMI. Guidelines for implementing Dublin Core in XML, 2 April 2003. http://www.dublincore.org/documents/dc-xml-guidelines/
[12]
SWISH-e. http://www.swish-e.com
[13]
W3C. XML Schema. http://www.w3.org/XML/Schema
[14]
W3C. Annotea Project. http://www.w3.org/2001/Annotea/ Amaya Home Page. http://www.w3.org/Amaya/

Cited By

View all
  • (2015)Semi-Structured Information: An Architecture Improving Search Results Using Domain Guidance1International Journal of Computers and Applications10.1080/1206212X.2010.1144196032:1(47-55)Online publication date: 11-Jul-2015
  • (2007)Web page title extraction and its applicationInformation Processing and Management: an International Journal10.1016/j.ipm.2006.11.00743:5(1332-1347)Online publication date: 1-Sep-2007
  • (2006)NASA's standards advisor pilotProceedings of the 44th annual ACM Southeast Conference10.1145/1185448.1185547(446-451)Online publication date: 10-Mar-2006
  • Show More Cited By

Index Terms

  1. SA_MetaMatch: relevant document discovery through document metadata and indexing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ACMSE '04: Proceedings of the 42nd annual ACM Southeast Conference
      April 2004
      485 pages
      ISBN:1581138709
      DOI:10.1145/986537
      • General Chair:
      • Seong-Moo Yoo,
      • Program Chair:
      • Letha Hughes Etzkorn
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 April 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Dublin Core
      2. document matching
      3. index
      4. metadata

      Qualifiers

      • Article

      Conference

      ACM SE04
      Sponsor:
      ACM SE04: ACM Southeast Regional Conference 2004
      April 2 - 3, 2004
      Alabama, Huntsville

      Acceptance Rates

      Overall Acceptance Rate 502 of 1,023 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Semi-Structured Information: An Architecture Improving Search Results Using Domain Guidance1International Journal of Computers and Applications10.1080/1206212X.2010.1144196032:1(47-55)Online publication date: 11-Jul-2015
      • (2007)Web page title extraction and its applicationInformation Processing and Management: an International Journal10.1016/j.ipm.2006.11.00743:5(1332-1347)Online publication date: 1-Sep-2007
      • (2006)NASA's standards advisor pilotProceedings of the 44th annual ACM Southeast Conference10.1145/1185448.1185547(446-451)Online publication date: 10-Mar-2006
      • (2005)Delivering knowledge to NASA scientist and engineersProceedings of the 43rd annual ACM Southeast Conference - Volume 110.1145/1167350.1167458(384-385)Online publication date: 18-Mar-2005
      • (2005)Title extraction from bodies of HTML documents and its application to web page retrievalProceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1076034.1076079(250-257)Online publication date: 15-Aug-2005

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media