Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/319950.319971acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article
Free access

Metadata and data structures for the historical newspaper digital library

Published: 01 November 1999 Publication History

Abstract

We examine metadata and data-structure issues for the Historical Newspaper Digital Library. This project proposes to digitize and then do OCR and linguisting processing on several years worth of historical newspapers. Newspapers are very complex information objects so developing a rich description of their content is challenging. In addition to frameworks for the logical structure and physical layout, we propose metadata relevant to the image processing and to the historians who will use this collection. Finally, we consider how the metadata infrastructure might be managed as it evolves with improved text processing capabilities and how an infrastructure might be developed to support a community of users.

References

[1]
Working Group 3: Structural and administrative metadata in page-image conversion projects: Discussion summary and recommendations. In TEI and XML in Digital Libraries. Washington, DC.
[2]
ALAM, H., CHANG, C. H., SHI, Z., AND Tu- PAJ, S. Extracting tables from printed documents. In Symposium on Document Image Understanding and Technology (1995), pp. 113-124.
[3]
BASKETTE, F. K., SISSORS, 3. Z., AND BROOKS, J. S. The Art of Editing. Allyn and Bacon, 1996.
[4]
BUNKE, H., AND WANG, P. S. P. Handbook on Character Recognition and Document Image Analysis. World Scientific, 1997.
[5]
CABO, M. A. An approach to a digital library of newspapers. Information Processing and Management 33, 5 (1997), 645-661.
[6]
DOCUMENT PROCESSING GROUP. Page decomposition and related research at the University of Maryland. In Symposium on Document Image Understanding and Technology (1995), pp. 39-55.
[7]
HARROWER, T. The Newspaper Designer's Handbook. McGraw Hill, 1997.
[8]
KANUNGO, T., AND ALLEN, R. B. Full-text access to historical newspapers. Tech. Rep. CS-TR- 4014, Laboratory for Language and Media Processing, University of Maryland, Apr. 1999.
[9]
LIBRARY OF CONGRESS. Thesaurus for Graphical Objects. 1995.
[10]
SCHROTH, R. A. The Eagle and Brooklyn: A Community Newspaper, 1841-1955. Greenwood Press, Westport CT, 1974.
[11]
YANIKOGLU, B. A., AND VINCENT, L. Pink Panther: A complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognition 31 (September 1998), 1191- 204.

Cited By

View all
  • (2009)Publishing Historical Texts on the Semantic Web - A Case StudyProceedings of the 2009 IEEE International Conference on Semantic Computing10.1109/ICSC.2009.9(167-173)Online publication date: 14-Sep-2009
  • (2008)Automated Processing of Digitized Historical Newspapers: Identification of Segments and GenresDigital Libraries: Universal and Ubiquitous Access to Information10.1007/978-3-540-89533-6_49(379-386)Online publication date: 2008
  • (2007)A framework for text processing and supporting access to collections of digitized historical newspapersProceedings of the 2007 conference on Human interface: Part II10.5555/1766591.1766619(235-244)Online publication date: 22-Jul-2007
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '99: Proceedings of the eighth international conference on Information and knowledge management
November 1999
564 pages
ISBN:1581131461
DOI:10.1145/319950
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1999

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OCR
  2. digital libraries
  3. history
  4. metadata
  5. newspapers

Qualifiers

  • Article

Conference

CIKM99
Sponsor:
CIKM99: Conference on Information and Knowledge Management
November 2 - 6, 1999
Missouri, Kansas City, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)9
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2009)Publishing Historical Texts on the Semantic Web - A Case StudyProceedings of the 2009 IEEE International Conference on Semantic Computing10.1109/ICSC.2009.9(167-173)Online publication date: 14-Sep-2009
  • (2008)Automated Processing of Digitized Historical Newspapers: Identification of Segments and GenresDigital Libraries: Universal and Ubiquitous Access to Information10.1007/978-3-540-89533-6_49(379-386)Online publication date: 2008
  • (2007)A framework for text processing and supporting access to collections of digitized historical newspapersProceedings of the 2007 conference on Human interface: Part II10.5555/1766591.1766619(235-244)Online publication date: 22-Jul-2007
  • (2007)The gray lady gets a new dressProceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries10.1145/1255175.1255226(259-268)Online publication date: 18-Jun-2007
  • (2007)A Framework for Text Processing and Supporting Access to Collections of Digitized Historical NewspapersHuman Interface and the Management of Information. Interacting in Information Environments10.1007/978-3-540-73354-6_26(235-244)Online publication date: 2007
  • (2006)The challenge of virginia banksProceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries10.1145/1141753.1141759(31-40)Online publication date: 11-Jun-2006
  • (2005)A focus-context browser for multiple timelinesProceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries10.1145/1065385.1065445(260-261)Online publication date: 7-Jun-2005
  • (2004)A query interface for an event gazetteerProceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries10.1145/996350.996368(72-73)Online publication date: 7-Jun-2004
  • (2004)DL Architecture for Indic ScriptsDocument Analysis Systems VI10.1007/978-3-540-28640-0_3(28-38)Online publication date: 2004
  • (2004)Document Analysis Systems for Digital Libraries: Challenges and OpportunitiesDocument Analysis Systems VI10.1007/978-3-540-28640-0_1(1-16)Online publication date: 2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media