Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/646635.700086guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Austrian Online Archive Processing: Analyzing Archives of the World Wide Web

Published: 16 September 2002 Publication History

Abstract

With the popularity of the World Wide Web and the recognition of its worthiness of being archived we find numerous projects aiming at creating large-scale repositories containing excerpts and snapshots of Web data. Interfaces are being created that allow users to surf through time, analyzing the evolution of Web pages, or retrieving information using search interfaces. Yet, with the timeline and metadata available in such a Web archive, additional analyzes that go beyond mere information exploration, become possible. In this paper we present the AOLAP project building a Data Warehouse of such a Web archive, allowing its analysis and exploration from different points of view using OLAP technologies. Specifically, technological aspects such as operating systems and Web servers used, geographic location, and Web technology such as the use of file types, forms or scripting languages, may be used to infer e.g. technology maturation or impact.

References

[1]
A. Arvidson, K. Persson, and J. Mannerheim. The Kulturarw3 project -The Royal Swedish Web Archiw3e -An example of "complete" collection of web pages. In Proceedings of the 66th IFLA Council and General Conference , Jerusalem, Israel, August 13-18 2000. http://www.ifla.org/IV/ifla66/papers/154-157e.htm.
[2]
S. Bhowmick, N. Keong, and S. Madria. Web schemas in WHOWEDA. In Proceedings of the ACM 3rd International Workshop on Data Warehousing and OLAP , Washington, DC, November 10 2000. ACM.
[3]
R. Bruckner and A. Tjoa. Managing time consistency for active data warehouse environments. In Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2001) , LNCS 2114, pages 254-263, Munich, Germany, September 2001. Springer. http://link.springer.de/link/ service/series/0558/papers/2114/21140219.pdf.
[4]
Computer Knowledge (CKNOW). FILExt: The file extension source. Webpage, June 2002. http://filext.com/.
[5]
A. Crespo and H. Garcia-Molin. Cost-driven design for archival repositories. In E. Fox and C. Borgman, editors, Proceedings of the First ACM/IEEE Joint Conference on Digital Libraries (JCDL'01) , pages 363-372, Roanoke, VA, June 24-28 2001. ACM. http://www.acm.org/dl.
[6]
M. Day. Metadata for digital preservation: A review of recent developments. In Proceedings of the 5. European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001) , Springer Lecture Notes in Computer Science, Darmstadt, Germany, Sept. 4-8 2001. Springer.
[7]
J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Databases, VLDB 2000 , pages 545-556, Cairo, Egypt, September 10-14 2000.
[8]
J. Hakala. Collecting and preserving the web: Developing and testing the NEDLIB harvester. RLG DigiNews , 5(2), April 15 2001. http://www.rlg.org/preserv/diginews/diginews5-2.html.
[9]
J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. Webbase: A repositoru of web pages. In Proceedings of the 9th International World Wide Web Conference (WWW9) , Amsterdam, The Netherlands, May 15-19 2000. Elsevir Science. http://www9.org/w9cdrom/296/296.html.
[10]
The Internet Archive. Website. http://www.archive.org.
[11]
B. Kahle. Preserving the internet. Scientific American , March 1997. http://www.sciam.com/0397issue/0397kahle.html.
[12]
R. Kimball. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling . John Wiley & Sons, 2 edition, 2002.
[13]
S. Leung, S. Perl, R. Stata, and J. Wiener. Towards web-scale web archeology. Research Report 174, Compaq Systems Research Center, Palo Alto, CA, September 10 2001. http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/SRC-174.pdf.
[14]
Nordic web archive. Website. http://nwa.nb.no.
[15]
T. Pedersen and C. Jensen. Multidimensional database technology. IEEE Computer , 34(12):40-46, December 2001.
[16]
A. Rauber. Austrian on-line archive: Current status and next steps. Presentation given at the ECDL Workshop on Digital Deposit Libraries (ECDL 2001) Darmstadt, Germany, September 8 2001.
[17]
A. Rauber and A. Aschenbrenner. Part of our culture is born digital - On efforts to preserve it for future generations. TRANS. On-line Journal for Cultural Studies (Internet-Zeitschrift für Kulturwissenschaften) , 10, July 2001. http://www.inst.at/trans/10Nr/inhalt10.htm.
[18]
T. Werf-Davelaar. Long-term preservation of electronic publications: The NEDLIB project. D-Lib Magazine , 5(9), September 1999. http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html.

Cited By

View all
  1. Austrian Online Archive Processing: Analyzing Archives of the World Wide Web

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ECDL '02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
    September 2002
    662 pages
    ISBN:3540441786

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 16 September 2002

    Author Tags

    1. data warehouse (DWH)
    2. digital cultural heritage
    3. on-line analytical processing (OLAP)
    4. technology evaluation
    5. web archiving

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)Automatic knowledge acquisition from historical document archivesCulture and computing10.5555/1985559.1985575(161-172)Online publication date: 1-Jan-2010
    • (2009)Interacting with (semi-) automatically extracted context of digital objectsProceedings of the 1st Workshop on Context, Information and Ontologies10.1145/1552262.1552267(1-9)Online publication date: 1-Jun-2009
    • (2007)Using the web infrastructure to preserve web pagesInternational Journal on Digital Libraries10.5555/2794654.32699446:4(327-349)Online publication date: 1-Jul-2007
    • (2007)Digital libraries and engines of searchProceedings of the 2007 Euro American conference on Telematics and information systems10.1145/1352694.1352703(1-9)Online publication date: 14-May-2007
    • (2007)Detecting age of page contentProceedings of the 9th annual ACM international workshop on Web information and data management10.1145/1316902.1316925(137-144)Online publication date: 9-Nov-2007
    • (2007)Towards mining past content of Web pagesThe New Review of Hypermedia and Multimedia10.1080/1361456070147889713:1(77-86)Online publication date: 1-Jan-2007

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media