Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICDE.2005.30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Bypass Caching: Making Scientific Databases Good Network Citizens

Published: 05 April 2005 Publication History

Abstract

Scientific database federations are geographically distributed and network bound. Thus, they could benefit from proxy caching. However, existing caching techniques are not suitable for their workloads, which compare and join large data sets. Existing techniques reduce parallelism by conducting distributed queries in a single cache and lose the data reduction benefits of performing selections at each database. We develop the bypass-yield formulation of caching, which reduces network traffic in wide-area database federations, while preserving parallelism and data reduction. Bypass-yield caching is altruistic; caches minimize the overall network traffic generated by the federation, rather than focusing on local performance. We present an adaptive, workload-driven algorithm for managing a bypass-yield cache. We also develop on-line algorithms that make no assumptions about workload: a k-competitive deterministic algorithm and a randomized algorithm with minimal space complexity. We verify the efficacy of bypass-yield caching by running workload traces collected from the Sloan Digital Sky Survey through a prototype implementation.

References

[1]
M. Altinel, Q. Luo, S. Krishnamurthy, C. Mohan, H. Pirahesh, B. G. Lindsay, H. Woo, and L. Brown. DBCache: Database caching for Web application servers. In SIGMOD, 2002.
[2]
K. Amiri, S. Park, and R. Tewari. A self-managing data cache for edge-of-network Web applications. In Proc. of the Conference on Information and Knowledge Management, 2002.
[3]
M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich, and T. Jin. Evaluating content management techniques for Web proxy caches. In Proc. of the Workshop on Internet Server Performance, 1999.
[4]
M. D. Beynon, T. Kurc, U. Catalyurek, C. Chang, A. Sussman, and J. Saltz. Distributed processing of very large datasets with DataCutter. Parallel Computing, 27(11), 2001.
[5]
A. Borodin and R. El-Yaniv. On-line Computation and Competitive Analysis. Cambridge University Press, 1998.
[6]
P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In Proc. of the USENIX Symposium on Internet Technology and Systems, 1997.
[7]
B. Y. Chan, A. Si, and H. V. Leong. A framework for cache management for mobile databases: Design and evaluation. Distributed Parallel Databases, 10(1), 2001.
[8]
A. K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational databases. In ACM Symposium on Theory of Computing, 1977.
[9]
L. Cherkasova and G. Ciardo. Role of aging, frequency, and size in Web cache replacement policies. In Proc. on High Performance Computing and Networking, 2001.
[10]
E. Coffman and P. Denning. Operating Systems Theory. Prentice Hall, Inc, 1973.
[11]
E. Cohen and H. Kaplan. LP-based analysis of Greedy-Dual-Size. In Proc. of the ACM-SIAM Symposium on Discrete Algorithms, 1999.
[12]
S. Dar, M. J. Franklin, B. T. Jònsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In VLDB, 1996.
[13]
H. Fujiwara and K. Iwama. Average-case competitive analyses for ski-rental problems. In Intl. Symposium on Algorithms and Computation, 2002.
[14]
G. Gallagher. Data transport within the Distributed Oceanographic Data System. In Proc. of the WWW Conference, 1995.
[15]
J. Gray and A. Szalay. Online science: TheWorld-Wide Telescope as a prototype for the new computational science. Presentation at the Supercomputing Conference, 2003.
[16]
J. M. Hellerstein. Practical predicate placement. In SIGMOD, 1994.
[17]
J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems, 6(1), 1988.
[18]
S. Irani. Page replacement with multi-size pages and applications to Web caching. In Proc. of the ACM Symposium on the Theory of Computing, 1997.
[19]
J. Jeong and M. Dubois. Cost-sensitive cache replacement algorithms. In Proc. of the High-Performance Computer Architecture. IEEE, 2003.
[20]
A. Jhingran. A performance study of query optimization algorithms on a database system supporting procedures. In VLDB, 1988.
[21]
S. Jiang and X. Zhang. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In ACM SIGMETRICS, 2002.
[22]
S. Jin and Z. Bestavros. Popularity-aware Greedy Dual-Size Web proxy caching. In Proc. of the Intl. Conference on Distributed Computing Systems, 2000.
[23]
S. Jin and Z. Bestavros. Greedydual* Web caching algorithms: Exploiting two sources of temporal locality in Web request streams. Computer Communications, 24(2), 2001.
[24]
D. Li, P. Cao, and M. Dahlin. WCIP: Web Cache Invalidation Protocol. Internet Draft, IETF, 2000.
[25]
Q. Luo and J. F. Naughton. Form-based proxy caching for database-backed Web sites. In VLDB, 2001.
[26]
T. Malik, R. Burns, and A. Chaudhary. Bypass caching: Making scientific databases good network citizens. Technical Report HSSL-2004-01, Storage Systems Lab, Johns Hopkins University, 2004.
[27]
T. Malik, A. S. Szalay, A. S. Budavri, and A. R. Thakar. SkyQuery: A Web service approach to federate databases. In Proc. of the Conference on Innovative Data Systems Research, 2003.
[28]
N. Megiddo and D. Modha. ARC: A self-tuning, low overhead replacement cache. In Proc. of the USENIX File and Storage Technologies Conference, 2003.
[29]
N. Niclausse, Z. Liu, and P. Nain. A new efficient caching policy for the World Wide Web. In Proc. of the Workshop on Internet Server Performance, 1998.
[30]
E. O'Neil, P. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. In ACM SIGMOD, 1993.
[31]
PlasmoDB: The plasmodium genome resource. http://www.plasmodb.org, 2002.
[32]
R. Pottinger and A. Y. Levy. A scalable algorithm for answering queries using views. In VLDB, 2000.
[33]
Q. Ren and M. H. Dunham. Semantic caching and query processing. Technical report, Department of CSE, Southern Methodist University, 1998.
[34]
N. Roussopoulos and H. Kang. Principles and techniques in the design of ADMS. IEEE Computer, 19(12), 1986.
[35]
P. Scheuermann, J. Shim, and R. Vingralek. Watchman: A data warehouse intelligent cache manager. In VLDB, 1996.
[36]
http://www.sdss.org.
[37]
http://www.skyquery.net.
[38]
R. Stevens. TCP/IP Illustrated Volume 1: The Protocols. Addison-Wesley, 1994.
[39]
M. Stonebraker, P. M. Aoki, R. Devine, W. Litwin, and M. Olson. Mariposa: A new architecture for distributed data. In ICDE, 1994.
[40]
The Times Ten Team. In-memory data management in the application tier. In ICDE, 2000.
[41]
G. Valentin, M. Zuliani, D. Zilio, and G. Lohman. DB2 Advisor: An optimizer smart enough to recommend its own indexes. In ICDE, 2000.
[42]
D. Wessels and K. C. Claffy. ICP and the Squid Web cache. IEEE Journal on Selected Areas in Communications, 16(3), 1998.
[43]
R. Wooster and M. Abrams. Proxy caching that estimates page load delays. In Proc. of the International WWW Conference, 1997.
[44]
N. E. Young. On-line caching as cache size varies. In Proc. of the Symposium on Discrete Algorithms, 1991.

Cited By

View all
  • (2022)CHEXProceedings of the VLDB Endowment10.14778/3514061.351407515:6(1297-1310)Online publication date: 22-Jun-2022
  • (2022)Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data ApplicationsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517892(1840-1854)Online publication date: 10-Jun-2022
  • (2016)Preference-based content replacement using recency-latency tradeoffWorld Wide Web10.1007/s11280-014-0313-119:3(323-350)Online publication date: 1-May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDE '05: Proceedings of the 21st International Conference on Data Engineering
April 2005
8301 pages
ISBN:0769522858

Publisher

IEEE Computer Society

United States

Publication History

Published: 05 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)CHEXProceedings of the VLDB Endowment10.14778/3514061.351407515:6(1297-1310)Online publication date: 22-Jun-2022
  • (2022)Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data ApplicationsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517892(1840-1854)Online publication date: 10-Jun-2022
  • (2016)Preference-based content replacement using recency-latency tradeoffWorld Wide Web10.1007/s11280-014-0313-119:3(323-350)Online publication date: 1-May-2016
  • (2011)Predicting cost amortization for query servicesProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989358(325-336)Online publication date: 12-Jun-2011
  • (2010)A dynamic data middleware cache for rapidly-growing scientific repositoriesProceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware10.5555/2023718.2023724(64-84)Online publication date: 29-Nov-2010
  • (2010)Efficient querying of distributed provenance storesProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851567(613-621)Online publication date: 21-Jun-2010
  • (2009)Improved techniques for result caching in web search enginesProceedings of the 18th international conference on World wide web10.1145/1526709.1526768(431-440)Online publication date: 20-Apr-2009
  • (2009)Object Caching for Queries and UpdatesProceedings of the 3rd International Workshop on Algorithms and Computation10.1007/978-3-642-00202-1_34(394-405)Online publication date: 18-Feb-2009
  • (2008)Workload-Aware Histograms for Remote ApplicationsProceedings of the 10th international conference on Data Warehousing and Knowledge Discovery10.1007/978-3-540-85836-2_38(402-412)Online publication date: 2-Sep-2008
  • (2007)A workload-driven unit of cache replacement for mid-tier database cachingProceedings of the 12th international conference on Database systems for advanced applications10.5555/1783823.1783867(374-385)Online publication date: 9-Apr-2007
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media