Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/342009.335375acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

Data mining on an OLTP system (nearly) for free

Published: 16 May 2000 Publication History

Abstract

This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to consistently provide one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head “passes over” them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.

References

[1]
Acharya, A., Uysal, M. and Saltz, J. "Active Disks" ASPLOS, October 1998.
[2]
Agrawal, R. and Schafer, J. "Parallel Mining of Association Rules" IEEE Transactions on Knowledge and Data Engineering 8 (6), December 1996.
[3]
Brown, K., Carey, M., DeWitt, D., Mehta, M. and Naughton, J. "Resource Allocation and Scheduling for Mixed Database Workloads" Technical Report, University of Wisconsin, 1992.
[4]
Brown, K., Carey, M. and Livny, M. "Managing Memory to Meet Multiclass Workload Response Time Goals" VLDB, August 1993.
[5]
Chaudhuri, S. and Dayal, U. "An Overview of Data Warehousing and OLAP Technology" SIGMOD Record 26 (1), March 1997.
[6]
Cirrus Logic, Inc. "New Open-Processor Platform Enables Cost-Effective, System-on-a-chip Solutions for Hard Disk Drives" www.cirrus.com/3ci, June 1998.
[7]
Denning, P.J. "Effects of Scheduling on File Memory Operations" AFIPS Spring Joint Computer Conference, April 1967.
[8]
Fayyad, U. "Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge" Database Programming and Design, March 1998.
[9]
Ganger, G.R., Worthington, B.L. and Patt, Y.N. "The DiskSim Simulation Environment Version 1.0 Reference Manual" Technical Report, University of Michigan, February 1998.
[10]
Gray, J. "What Happens When Processing, Storage, and Bandwidth are Free and Infinite?" IOPADS Keynote, November 1997.
[11]
Guha, S., Rastogi, R. and Shim, K. "CURE: An Efficient Clustering Algorithm for Large Databases" SIGMOD, June 1998.
[12]
Hewlett-Packard Company "HP to Deliver Enterprise-Class Storage Area Network Management Solution" News Release, October 1998.
[13]
IBM Corporation and International Data Group "Survey says Storage Area Networks may unclog future roadblocks to e-Business" News Release, December 1999.
[14]
Keeton, K., Patterson, D.A. and Hellerstein, J.M. "A Case for Intelligent Disks (IDISKs)" SIGMOD Record 27 (3), August 1998.
[15]
Korn, F., Labrinidis, A., Kotidis, Y. and Faloutsos, C. "Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining" VLDB, August 1998.
[16]
Paulin, J. "Performance Evaluation of Concurrent OLTP and DSS Workloads in a Single Database System" Master's Thesis, Carleton University, November 1997.
[17]
Riedel, E., Gibson, G. and Faloutsos, C. "Active Storage For Large-Scale Data Mining and Multimedia" VLDB, August 1998.
[18]
Ruemmler, C. and Wilkes, J. "An Introduction to Disk Drive Modeling" IEEE Computer 27 (3), March 1994.
[19]
Seagate Technology, Inc. "Storage Networking: The Evolution of Information Management" White Paper, November 1998.
[20]
Siemens Microelectronics, Inc. "Siemens Announces Availability of TriCore-1 For New Embedded System Designs" News Release, March 1998.
[21]
Veritas Software Corporation "Veritas Software and Other Industry Leaders Demonstrate SAN Solutions" News Release, May 1999.
[22]
Widom, J. "Research Problems in Data Warehousing" CIKM, November 1995.
[23]
Worthington, B.L., Ganger, G.R. and Patt, Y.N. "Scheduling Algorithms for Modern Disk Drives" SIGMETRICS, May 1994.
[24]
Worthington, B.L., Ganger, G.R., Patt, Y.N., Wilkes, J. "On-Line Extraction of SCSI Disk Drive Parameters" SIGMETRICS, May 1995.
[25]
Zhang, T., Ramakrishnan, R. and Livny, M. "BIRCH: A New Data Clustering Algorithm and Its Applications" Data Mining and Knowledge Discovery 1 (2), 1997.

Cited By

View all
  • (2015)Opportunistic storage maintenanceProceedings of the 25th Symposium on Operating Systems Principles10.1145/2815400.2815424(457-473)Online publication date: 4-Oct-2015
  • (2012)An active storage framework for object storage devices012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)10.1109/MSST.2012.6232372(1-12)Online publication date: Apr-2012
  • (2011)Survey and analysis of disk scheduling methodsACM SIGARCH Computer Architecture News10.1145/2024716.202471939:2(8-25)Online publication date: 31-Aug-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2000

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS00
Sponsor:

Acceptance Rates

SIGMOD '00 Paper Acceptance Rate 42 of 248 submissions, 17%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)19
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Opportunistic storage maintenanceProceedings of the 25th Symposium on Operating Systems Principles10.1145/2815400.2815424(457-473)Online publication date: 4-Oct-2015
  • (2012)An active storage framework for object storage devices012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)10.1109/MSST.2012.6232372(1-12)Online publication date: Apr-2012
  • (2011)Survey and analysis of disk scheduling methodsACM SIGARCH Computer Architecture News10.1145/2024716.202471939:2(8-25)Online publication date: 31-Aug-2011
  • (2011)A Distributed Reconfigurable Active SSD Platform for Data Intensive ApplicationsProceedings of the 2011 IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2011.14(25-34)Online publication date: 2-Sep-2011
  • (2010)NCQ vs. I/O schedulerACM Transactions on Storage10.1145/1714454.17144566:1(1-37)Online publication date: 5-Apr-2010
  • (2007)The leganet systemInformation Systems10.1016/j.is.2005.09.00432:2(320-343)Online publication date: 1-Apr-2007
  • (2006)Intelligent storageACM Transactions on Storage10.1145/1168910.11689122:3(255-282)Online publication date: 1-Aug-2006
  • (2005)Systems Support for Preemptive Disk SchedulingIEEE Transactions on Computers10.1109/TC.2005.17054:10(1314-1326)Online publication date: 1-Oct-2005
  • (2005)OLTP and OLAP Data Integration: A Review of Feasible Implementation Methods and Architectures for Real Time Data AnalysisProceedings. IEEE SoutheastCon, 2005.10.1109/SECON.2005.1423297(515-520)Online publication date: 2005
  • (2004)Memory-adative association rules miningInformation Systems10.1016/S0306-4379(03)00035-829:5(365-384)Online publication date: 1-Jul-2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media