Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/375663.375723acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Monitoring XML data on the Web

Published: 01 May 2001 Publication History

Abstract

We consider the monitoring of a flow of incoming documents. More precisely, we present here the monitoring used in a very large warehouse built from XML documents found on the web. The flow of documents consists in XML pages (that are warehoused) and HTML pages (that are not). Our contributions are the following:
a subscription language which specifies the monitoring of pages when fetched, the periodical evaluation of continuous queries and the production of XML reports.
the description of the architecture of the system we implemented that makes it possible to monitor a flow of millions of pages per day with millions of subscriptions on a single PC, and scales up by using more machines.
a new algorithm for processing alerts that can be used in a wider context.
We support monitoring at the page level (e.g., discovery of a new page within a certain semantic domain) as well as at the element level (e.g., insertion of a new electronic product in a catalog).
This work is part of the Xyleme system. Xyleme is developed on a cluster of PCs under Linux with Corba communications. The part of the system described in this paper has been implemented. We mention first experiments.

References

[1]
{1} Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web. Morgan Kaufmann, California, 2000.
[2]
{2} Vincent Aguilera, Sophie Cluet, Pierangelo Veltri, and Fanny Watez. Querying xml documents in xyleme. ACM SIGIR Workshop on XML and information retrieval, 2000. To appear.
[3]
{3} Apache web server, http://www.apache.org/.
[4]
{4} The internet archive, http://www.archive.org/.
[5]
{5} Kevin Atkinson. Mysq1++ a c++ api for mysq1, 2000. http://www.mysq1.com/documentation/.
[6]
{6} S. Chawathe, S. Abiteboul, and J. Widom. Representing and querying changes in semistructured data. Proceedings of the IEEE International Conference on Data Engineering, pages 4-13, 1998.
[7]
{7} S. Chawathe, S. Abiteboul, and J. Widom. Managing historical semistructured data. Theory and practice of object systems, 5(3):143-162, August 1999.
[8]
{8} Jianjun Chen, David DeWitt, Fend Tian, and Yuan Wang. Niagaracq: A scalable continous query system for the internet databases. ACM SIGMOD, page 379, 2000.
[9]
{9} Webcq, opencq webpage. http://www.cc.gatech.edu/projects/disl/WebCQ/.
[10]
{10} Corba web page. http://www.omg.org/.
[11]
{11} Document object model (DOM) level 1 specification version 1.0, October 1998.
[12]
{12} F. Fabret, F. Llirbat, J. Pereira, and D. Shasha. Publish/subscribe on the web at extreme speed. submitted to publication, 2000.
[13]
{13} Eric Hanson, Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J.B. Park, and Albert Vernon. Scalable trigger processing. Proceedings of the 15th International Conference on Data Engineering, pages 266-275, 1999.
[14]
{14} Jérémy Jouglet. Souscription de requêtes dans un entrepôt de données xml. Stage d'option scientifique de l'École Polytechnique, 2000.
[15]
{15} Ling Liu, Calton Pu, and Wei Tang. Continual queries for internet scale event-driven information delivery. IEEE TKDE, 11(4):610, 1999.
[16]
{16} Ling Liu, Calton Pu, Wei Tang, and Wei Han. Conquer: A continual query system for update monitoring in the www. International Journal of Computer Systems, Science and Engineering, 2000.
[17]
{17} Amélie Marian, Serge Abiteboul, and Laurent Mignet. Change-centric management of versions in an xml warehouse, October 2000. BDA'00.
[18]
{18} Alain Michard. XML, langage et applications. Eyrolles, Paris, 1999.
[19]
{19} Laurent Mignet, Serge Abiteboul, Sébastien Ailleret, Bernd Amann, Amélie Marian, and Mihai Preda. Acquiring xml pages for a webhouse, October 2000. BDA'00.
[20]
{20} Mind-it web page. http://mindit.netmind.com/.
[21]
{21} Guido Moerkotte. The aodb relational system. U. Mannheim, personal communication, 1999.
[22]
{22} Niagara webpage, http://www.cs.wisc.edu/niagara/.
[23]
{23} Northern light news search. http://www.northernlight.com/news.html.
[24]
{24} Information on clusters of pcs. http://www.alinka.com/fr/index.htm.
[25]
{25} R.T. Snodgrass, editor. The TSQL2 temporal query language. Kluwer Press, 1995.
[26]
{26} Bjarne Stroustrup. The C++ programming language. Addison-Wesley, Reading, Massachusetts, special edition, 2000.
[27]
{27} W3C. eXtensible Markup Language (XML) 1.0, february 1998.
[28]
{28} World Wide Web consortium page on XML. http://www.w3c.org/TR/REC-XML.
[29]
{29} J. Widom and S. Ceri. Active database systems: Triggers and rules for advanced processing. Morgan-Kaufmann, California, 1995.
[30]
{30} Jennifer Widom. Research problems in data warehousing. International Conference on Information and Knowledge Management (CIKM), 1995.
[31]
{31} Xyleme home page. http://www.xyleme.com/.

Cited By

View all
  • (2022)Approximate Range ThresholdingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526123(1108-1121)Online publication date: 10-Jun-2022
  • (2016)Range Thresholding on StreamsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915965(571-582)Online publication date: 26-Jun-2016
  • (2013)Parse tree based approach for processing XML streams2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)10.1109/IRI.2013.6642517(546-553)Online publication date: Aug-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS01
Sponsor:

Acceptance Rates

SIGMOD '01 Paper Acceptance Rate 44 of 293 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Approximate Range ThresholdingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526123(1108-1121)Online publication date: 10-Jun-2022
  • (2016)Range Thresholding on StreamsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2915965(571-582)Online publication date: 26-Jun-2016
  • (2013)Parse tree based approach for processing XML streams2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)10.1109/IRI.2013.6642517(546-553)Online publication date: Aug-2013
  • (2011)Prefix-based node numbering for temporal XMLProceedings of the 12th international conference on Web information system engineering10.5555/2050963.2050977(172-184)Online publication date: 13-Oct-2011
  • (2011)Prefix-Based Node Numbering for Temporal XMLWeb Information System Engineering – WISE 201110.1007/978-3-642-24434-6_13(172-184)Online publication date: 2011
  • (2010)Research on recursive query over XML data stream based on pushdown automation2010 International Conference on Intelligent Computing and Integrated Systems10.1109/ICISS.2010.5657114(782-785)Online publication date: Oct-2010
  • (2009)Multidimensional integrated ontologiesJournal on Data Semantics XIII10.5555/2172259.2172260(1-36)Online publication date: 1-Jan-2009
  • (2009)CIMDIFFProceedings of the 23rd conference on Large installation system administration10.5555/1855698.1855709(11-11)Online publication date: 1-Nov-2009
  • (2009)Information filtering and query indexing for an information retrieval modelACM Transactions on Information Systems10.1145/1462198.146220227:2(1-47)Online publication date: 9-Mar-2009
  • (2009)SurveyComputer Science Review10.1016/j.cosrev.2009.03.0013:3(151-173)Online publication date: 1-Aug-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media