Abstract
To achieve improved availability and performance, often, local copies of remote data from autonomous sources are maintained. Examples of such local copies include data warehouses and repositories managed by web search engines. As the size of the local data grows, it is not always feasible to maintain the freshness (up-to-dateness) of the entire data due to resource limitations. Previous contributions to maintaining freshness of local data use a freshness metric as the proportion of fresh documents within the total repository (we denote this as average freshness). As a result, even though updates to more frequently changing data are not captured, the average freshness measure may still be high. In this paper, we argue that, in addition to average freshness, it is important that the freshness metric should also include the proportion of changes captured for each document, which we call object freshness. The latter is particularly important when both the current and historical versions of information sources are queried or mined. We propose an approach by building an access scheduling tree (AST) to precisely schedule access to remote sources that achieves optimal freshness of the local data under limited availability of resources. We show, via experiments, the performance of our approach is significantly higher than a linear priority queue.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brewington, B.E., Cybenko, G.: How Dynamic is the Web? In: 9th World Wide Web Conference, WWW9 (2000)
Carrano, F.M., Prichard, J.J.: Data Abstraction and Problem Solving with C++, 3rd edn. Addison-Wesley, Reading (2001)
Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: 26th International Conference on Very Large Databases (VLDB), pp. 200–209 (2000)
Cho, J., Garcia-Molina, H.: Synchronizing a Database to Improve Freshness. In: ACM SIGMOD International Conference on Management of Data, pp. 117–128 (2000)
Cho, J., Garcia-Molina, H., Page, L.: Efficient Crawling Through URL Ordering. In: 7th World Wide Web Conference, WWW7 (1998)
Douglis, F., Feldmann, A., Krishnamurthy, B., Mogul, J.: Rate of Change and Other Metrics: A Live Study of the World Wide Web. In: USENIX Symposium on Internetworking Technologies and Systems (December 1997)
Heydon, A., Najork, M.: Mercator: A Scalable, Extensible Web Crawler. World Wide Web 2(4), 219–229 (1999)
Qin, L., Atluri, V.: An Access Scheduling Tree to Achieve Optimal Freshness in Local Repositories. Technical report (2002)
Wang, Y., DeWitt, D.J., Cai, J.-Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: 19th International Conference on Data Engineering, ICDE (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qin, L., Atluri, V. (2003). An Access Scheduling Tree to Achieve Optimal Freshness in Local Repositories. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds) E-Commerce and Web Technologies. EC-Web 2003. Lecture Notes in Computer Science, vol 2738. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45229-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-45229-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40808-6
Online ISBN: 978-3-540-45229-4
eBook Packages: Springer Book Archive