Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
chapter

OLTP through the looking glass, and what we found there

Published: 01 December 2018 Publication History

Abstract

Online Transaction Processing (OLTP) databases include a suite of features---disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading---that were optimized for computer technology of the late 1970's. Advances in modern processors, memories, and networks mean that today's computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less. Yet database architecture has changed little.
Based on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C. Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload. Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance. We also show that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.

References

[1]
Agrawal, R., Carey, M. J., and Livny, M. "Concurrency control performance modeling: alternatives and implications." ACM Trans. Database Syst. 12(4), Dec. 1987.
[2]
Aguilera, M., Merchant, A., Shah, M., Veitch, A. C., and Karamanolis, C. T. "Sinfonia: a new paradigm for building scalable distributed systems." In Proc. SOSP, 2007.
[3]
Aho, A. V., Hopcroft, J. E., and Ullman, J. D. "The Design and Analysis of Computer Algorithms." Addison-Wesley Publishing Company, 1974.
[4]
Ailamaki, A., DeWitt, D. J., Hill, M.D., and Wood, D. A. "DBMSs on a Modern Processor: Where Does Time Go?" In Proc. VLDB, 1999, 266-277.
[5]
Ailamaki, A. "Database Architecture for New Hardware." Tutorial. In Proc. VLDB, 2004.
[6]
Anon et al. "A Measure of Transaction Processing Power." In Datamation, February 1985.
[7]
Baulier, J. D., Bohannon, P., Khivesara, A., et al. "The DataBlitz Main-Memory Storage Manager: Architecture, Performance, and Experience." In The VLDB Journal, 1998.
[8]
Bitton, D., DeWitt, D. J., and Turbyfill, C. "Benchmarking Database Systems, a Systematic Approach." In Proc. VLDB, 1983.
[9]
Bitton, D., Hanrahan, M., and Turbyfill, C. "Performance of Complex Queries in Main Memory Database Systems." In Proc. ICDE, 1987.
[10]
Boncz, P. A., Manegold, S., and Kersten, M. L. "Database Architecture Optimized for the New Bottleneck: Memory Access." In Proc. VLDB, 1999.
[11]
Brewer, E. A. "Towards robust distributed systems (abstract)." In Proc. PODC, 2000.
[12]
Bugnion, E., Devine, S., and Rosenblum, M. "Disco: running commodity operating systems on scalable multiprocessors." In Proc. SOSP, 1997.
[13]
Carey, M. J., DeWitt, D. J., Franklin, M. J. et al. "Shoring up persistent applications." In Proc. SIGMOD, 1994.
[14]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. "Bigtable: A Distributed Storage System for Structured Data." In Proc. OSDI, 2006.
[15]
Dean, J. and Ghemawat, S. "MapReduce: Simplified Data Processing on Large Clusters." In Proc. OSDI, 2004.
[16]
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. "Dynamo: amazon's highly available key-value store." In Proc. SOSP, 2007.
[17]
DeWitt, D. J., Ghandeharizadeh, S., Schneider, D. A., Bricker, A., Hsiao, H., and Rasmussen, R. "The Gamma Database Machine Project." IEEE Transactions on Knowledge and Data Engineering 2(1):44-62, March 1990.
[18]
Eich, M. H. "MARS: The Design of A Main Memory Database Machine." In Proc. of the 1987 International workshop on Database Machines, October, 1987.
[19]
Garcia-Molina, H. and Salem, K. "Main Memory Database Systems: An Overview." IEEE Trans. Knowl. Data Eng. 4(6): 509-516 (1992).
[20]
Gray, J. and Reuter, A. "Transaction Processing: Concepts and Techniques." Morgan Kaufmann Publishers, Inc., 1993.
[21]
Gribble, S. D., Brewer, E. A., Hellerstein, J. M., and Culler, D. E. "Scalable, Distributed Data Structures for Internet Service Construction." In Proc. OSDI, 2000.
[22]
Helland, P. "Life beyond Distributed Transactions: an Apostate's Opinion." In Proc. CIDR, 2007.
[23]
Herlihy, M. P. and Moss, J. E. B. "Transactional Memory: architectural support for lock-free data structures." In Proc. ISCA, 1993.
[24]
Kung, H. T. and Robinson, J. T. "On optimistic methods for concurrency control." ACM Trans. Database Syst. 6(2):213-226, June 1981.
[25]
Lau, E. and Madden, S. "An Integrated Approach to Recovery and High Availability in an Updatable, Distributed Data Warehouse." In Proc. VLDB, 2006.
[26]
Lehman, T. J. and Carey, M. J. "A study of index structures for main memory database management systems." In Proc. VLDB, 1986.
[27]
Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., Shrira, L., and Williams, M. "Replication in the harp file system." In Proc. SOSP, pages 226-238, 1991.
[28]
McWherter, D. T., Schroeder, B., Ailamaki, A., and Harchol-Balter, M. "Priority Mechanisms for OLTP and Transactional Web Applications." In Proc. ICDE, 2004.
[29]
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., and Schwarz, P. "ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging." ACM Trans. Database Syst. 17(1):94-162, 1992.
[30]
Mohan, C. "ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes." 1989, Research Report RJ 7008, Data Base Technology Institute, IBM Almaden Research Center.
[31]
Mohan, C. and Levine, F. "ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging." 1989, Research Report RJ 6846, Data Base Technology Institute, IBM Almaden Research Center.
[32]
Mucci, P. J., Browne, S., Deane, C., and Ho, G. "PAPI: A Portable Interface to Hardware Performance Counters." In Proc. Department of Defense HPCMP Users Group Conference, Monterey, CA, June 1999.
[33]
Rao, J. and Ross, K. A. "Cache Conscious Indexing for Decision-Support in Main Memory." In Proc. VLDB, 1999.
[34]
Rao, J. and Ross, K. A. "Making B+-trees cache conscious in main memory." In SIGMOD Record, 29(2):475-486, June 2000.
[35]
Stoica, I., Morris, R., Karger, D. R., Kaashoek, M. F., and Balakrishnan, H. "Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications." In Proc. SIGCOMM, 2001.
[36]
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N., and Zdonik, S. "C-Store: A Column-oriented DBMS." In Proc. VLDB, 2005.
[37]
Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., and Helland, P. "The End of an Architectural Era (It's Time for a Complete Rewrite)." In Proc. VLDB, 2007.
[38]
Oracle TimesTen. http://www.oracle.com/timesten/index.html. 2007.
[39]
The Transaction Processing Council. TPC-C Benchmark (Rev. 5.8.0), 2006. http://www.tpc.org/tpcc/spec/tpcc_current.pdf
[40]
Whitney, A., Shasha, D., and Apter, S. "High Volume Transaction Processing Without Concurrency Control, Two Phase Commit, SQL or C." In Proc. HPTPS, 1997.
[41]
D. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. 2005. The design of the Borealis stream processing engine. Proc. of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR'05), Asilomar, CA, January.
[42]
Z. Abedjan, L. Golab, and F. Naumann. August 2015. Profiling relational data: a survey. The VLDB Journal, 24(4): 557-581.
[43]
ACM. 2015a. Announcement: Michael Stonebraker, Pioneer in Database Systems Architecture, Receives 2014 ACM Turing Award. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm. Accessed February 5, 2018.
[44]
ACM. March 2015b. Press Release: MIT's Stonebraker Brought Relational Database Systems from Concept to Commercial Success, Set the Research Agenda for the Multibillion-Dollar Database Field for Decades. http://sigmodrecord.org/publications/sigmodRecord/1503/pdfs/04_announcements_Stonebraker.pdf. Accessed February 5, 2018.
[45]
ACM. 2016. A.M. Turing Award Citation and Biography. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm. Accessed September 24, 2018.
[46]
Y. Ahmad, B. Berg, U. Çetintemel, M. Humphrey, J. Hwang, A. Jhingran, A. Maskey, O. Papaemmanouil, A. Rasin, N. Tatbul, W. Xing, Y. Xing, and S. Zdonik. June 2005. Distributed operation in the Borealis Stream Processing Engine. Demonstration, ACM SIGMOD International Conference on Management of Data (SIGMOD'05). Baltimore, MD. Best Demonstration Award.
[47]
M. M. Astrahan, M.W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. 1976. System R: relational approach to database management. ACM Transactions on Database Systems, 1(2): 97-137.
[48]
P. Bailis, E. Gan, S. Madden, D. Narayanan, K. Rong, and S. Suri. 2017. Macrobase: Prioritizing attention in fast data. Proc. of the 2017 ACM International Conference on Management of Data. ACM.
[49]
Berkeley Software Distribution. n.d. In Wikipedia. http://en.wikipedia.org/wiki/Berkeley_Software_Distribution. Last accessed March 1, 2018.
[50]
G. Beskales, I.F. Ilyas, L. Golab, and A. Galiullin. 2013. On the relative trust between inconsistent data and inaccurate constraints. Proc. of the IEEE International Conference on Data Engineering, ICDE 2013, pp. 541-552. Australia.
[51]
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. C. Whaley. 2017. ScaLAPACK Users' Guide. Society for Industrial and Applied Mathematics http://netlib.org/scalapack/slug/index.html. Last accessed December 31, 2017.
[52]
D. Bitton, D. J. DeWitt, and C. Turbyfill. 1983. Benchmarking database systems--a systematic approach. Computer Sciences Technical Report #526, University of Wisconsin. http://minds.wisconsin.edu/handle/1793/58490.
[53]
P. A. Boncz, M. L. Kersten, and S. Manegold. December 2008. Breaking the memory wall in MonetDB. Communications of the ACM, 51(12): 77-85.
[54]
M. L. Brodie. June 2015. Understanding data science: an emerging discipline for data-intensive discovery. In S. Cutt, editor, Getting Data Right: Tackling the Challenges of Big Data Volume and Variety. O'Reilly Media, Sebastopol, CA.
[55]
Brown University, Department of Computer Science. Fall 2002. Next generation stream-based applications. Conduit Magazine, 11(2). https://cs.brown.edu/about/conduit/conduit_v11n2.pdf. Last accessed May 14, 2018.
[56]
BSD licenses. n.d. In Wikipedia. http://en.wikipedia.org/wiki/BSD_licenses. Last accessed March 1, 2018.
[57]
M. Cafarella and C. Ré. April 2018. The last decade of database research and its blindingly bright future. or Database Research: A love song. DAWN Project, Stanford University. http://dawn.cs.stanford.edu/2018/04/11/db-community/.
[58]
M. J. Carey, D. J. DeWitt, M. J. Franklin, N. E Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. K. Tan, O. G. Tsatalos, S. J. White, and M. J. Zwilling. 1994. Shoring up persistent applications. Proc. of the 1994 ACM SIGMOD international conference on Management of data (SIGMOD '94), 383-394.
[59]
M. J. Carey, D. J. Dewitt, M. J. Franklin, N. E. Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. K. Tan, O. G. Tsatalos, S. J. White, and M. J. Zwilling. 1994. Shoring up persistent applications. In Proc. of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD '94), pp. 383-394.
[60]
M. J. Carey, L. M. Haas, P. M. Schwarz, M. Arya, W. E. Cody, R. Fagin, M. Flickner, A. W. Luniewski, W. Niblack, and D. Petkovic. 1995. Towards heterogeneous multimedia information systems: The garlic approach. In Research Issues in Data Engineering, 1995: Distributed Object Management, Proceedings, pp. 124-131. IEEE.
[61]
CERN. http://home.cern/about/computing. Last accessed December 31, 2017.
[62]
D. D. Chamberlin and R. F. Boyce. 1974. SEQUEL: A structured English query language. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '74), pp. 249-264. ACM, New York.
[63]
D. D. Chamberlin, M. M. Astrahan, K. P. Eswaran, P. P. Griffiths, R. A. Lorie, J. W. Mehl, P. Reisner, and B. W. Wade. 1976. SEQUEL 2: a unified approach to data definition, manipulation, and control. IBM Journal of Research and Development, 20(6): 560-575.
[64]
S. Chandrasekaran, O, Cooper, A. Deshpande, M.J. Franklin, J.M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. Proc. of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03), pp. 668-668. ACM, New York.
[65]
J. Chen, D.J. DeWitt, F. Tian, and Y. Wang. 2000. NiagaraCQ: A scalable continuous query system for Internet databases. Proc. of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00), pp. 379-390. ACM, New York.
[66]
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, Y. Xing, and S. Zdonik. 2003. Scalable distributed stream processing. Proc. of the First Biennial Conference on Innovative Database Systems (CIDR'03), Asilomar, CA, January.
[67]
C. M. Christensen. 1997. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Harvard Business School Press, Boston, MA.
[68]
X. Chu, I. F. Ilyas, and P. Papotti. 2013a. Holistic data cleaning: Putting violations into context. Proc. of the IEEE International Conference on Data Engineering, ICDE 2013, pp. 458-469. Australia.
[69]
X. Chu, I. F. Ilyas, and P. Papotti. 2013b. Discovering denial constraints. Proc. of the VLDB Endowment, PVLDB 6(13): 1498-1509.
[70]
X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. 2015. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In Proc. of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15), pp. 1247-1261. ACM, New York.
[71]
P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer, and P. M. Rice. 2009. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 38.6: 1767-1771.
[72]
E. F. Codd. June 1970. A relational model of data for large shared data banks. Communications of the ACM, 13(6): 377-387.
[73]
M. Collins. 2016. Thomson Reuters uses Tamr to deliver better connected content at a fraction of the time and cost of legacy approaches. Tamr blog, July 28. https://www.tamr.com/video/thomson-reuters-uses-tamr-deliver-better-connected-content-fraction-time-cost-legacy-approaches/. Last accessed January 24, 2018.
[74]
G. Copeland and D. Maier. 1984. Making smalltalk a database system. Proc. of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD '84), pp. 316-325. ACM, New York.
[75]
C. Cranor, T. Johnson, V. Shkapenyuk, and O. Spatscheck. 2003. Gigascope: A stream database for network applications. Proc. of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03), pp. 647-651. ACM, New York.
[76]
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Cetintemel, and S. Zdonik. 2015. Tupleware: "Big Data, Big Analytics, Small Clusters." CIDR.
[77]
M. Dallachiesa, A. Ebaid, A. Eldawi, A. Elmagarmid, I. F. Ilyas, M. Ouzzani, and N. Tang. 2013. NADEEF, a commodity data cleaning system. Proc. of the 2013 ACM SIGMOD Conference on Management of Data, pp. 541-552. New York.
[78]
T. Dasu and J. M. Loh. 2012. Statistical distortion: Consequences of data cleaning. PVLDB, 5(11): 1674-1683.
[79]
C. J. Date and E. F. Codd. 1975. The relational and network approaches: Comparison of the application programming interfaces. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control: Data Models: Data-Structure-Set Versus Relational (SIGFIDET '74), pp. 83-113. ACM, New York.
[80]
D. J. DeWitt. 1979a. Direct a multiprocessor organization for supporting relational database management systems. IEEE Transactions of Computers, 28(6), 395-406.
[81]
D. J. DeWitt. 1979b. Query execution in DIRECT. In Proc. of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD '79), pp. 13-22. ACM, New York.
[82]
D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. 1986. GAMMA--a high performance dataflow database machine. Proc. of the 12th International Conference on Very Large Data Bases (VLDB '86), W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, pp. 228-237. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[83]
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen. March 1990. The Gamma database machine project. IEEE Transactions on Knowledge and Data Engineering, 2(1): 44-62.
[84]
D. DeWitt and J. Gray. June 1992. Parallel database systems: the future of high performance database systems. Communications of the ACM, 35(6): 85-98.
[85]
D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. 2013. Split query processing in polybase. Proc. of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13), pp. 1255-1266. ACM, New York.
[86]
C. Diaconu, C. Freedman, E. Ismert, P-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proc. of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13), pp. 1243-1254. ACM, New York.
[87]
K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. November 1976. The notions of consistency and predicate locks in a database system. Communications of the ACM, 19(11): 624-633.
[88]
W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. April 2012. Towards certain fixes with editing rules and master data. The VLDB Journal, 21(2): 213-238.
[89]
D. Fogg. September 1982. Implementation of domain abstraction in the relational database system INGRES. Master of Science Report, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA.
[90]
T. Flory, A. Robbin, and M. David. May 1988. Creating SIPP longitudinal analysis files using a relational database management system. CDE Working Paper No. 88-32, Institute for Research on Poverty, University of Wisconsin-Madison, Madison, WI.
[91]
V. Gadepally, J. Kepner, W. Arcand, D. Bestor, B. Bergeron, C. Byun, L. Edwards, M. Hubbell, P. Michaleas, J. Mullen, A. Prout, A. Rosa, C. Yee, and A. Reuther. 2015. D4M: Bringing associative arrays to database engines. High Performance Extreme Computing Conference (HPEC). IEEE, 2015.
[92]
V. Gadepally, K. O'Brien, A. Dziedzic, A. Elmore, J. Kepner, S. Madden, T. Mattson, J. Rogers, Z. She, and M. Stonebraker. September 2017. BigDAWG Version 0.1. IEEE High Performance Extreme.
[93]
J. Gantz and D. Reinsel. 2013. The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East--United States, IDC, February.
[94]
L. Gerhardt, C. H. Faham, and Y. Yao. 2015. Accelerating scientific analysis with SciDB. Journal of Physics: Conference Series, 664(7).
[95]
B. Grad. 2007. Oral history of Michael Stonebraker, Transcription. Recorded: August 23, 2007. Computer History Museum, Moultonborough, NH. http://archive.computerhistory.org/resources/access/text/2012/12/102635858-05-01-acc.pdf. Last accessed April 8, 2018.
[96]
A. Guttman. 1984. R-trees: a dynamic index structure for spatial searching. In Proc. of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD '84), pp. 47-57. ACM, New York.
[97]
L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. 1989. Extensible query processing in starburst. In Proc. of the 1989 ACM SIGMOD International Conference on Management of Data (SIGMOD '89), pp. 377-388. ACM, New York.
[98]
D. Halperin, V. Teixeira de Almeida, L. L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker. 2014. Demonstration of the Myria big data management service. Proc. of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14), p. 881-884. ACM, New York.
[99]
B. Haynes, A. Cheung, and M. Balazinska. 2016. PipeGen: Data pipe generator for hybrid analytics. Proc. of the Seventh ACM Symposium on Cloud Computing (SoCC '16), M. K. Aguilera, B. Cooper, and Y. Diao, editors, pp. 470-483. ACM, New York.
[100]
M. A. Hearst. 2009. Search user interfaces. Cambridge University Press, New York.
[101]
J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. 1995. Generalized search trees for database systems. In Proc. of the 21th International Conference on Very Large Data Bases (VLDB '95), pp. 562-573. Morgan Kaufmann Publishers Inc., San Francisco, CA. http://dl.acm.org/citation.cfm?id=645921.673145.
[102]
J. M. Hellerstein, E. Koutsoupias, D. P. Miranker, C. H. Papadimitriou, V. Samoladas. 2002. On a model of indexability and its bounds for range queries, Journal of the ACM (JACM), 49.1: 35-55.
[103]
IBM. 1997. Special Issue on IBM's S/390 Parallel Sysplex Cluster. IBM Systems Journal, 36(2).
[104]
S. Idreos, F. Groffen, N. Nes, S. Manegold, S. K. Mullender, and M. L. Kersten. 2012. MonetDB: two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin, 35(1): 40-45.
[105]
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. Zdonik. 2008. Towards a streaming SQL standard. Proc. VLDB Endowment, pp. 1379-1390. August 1-2.
[106]
A. E. W. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. E. Moody, P. Szolovits, L. A. G. Celi, and R. G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3: 160035
[107]
V. Josifovski, P. Schwarz, L. Haas, and E. Lin. 2002. Garlic: a new flavor of federated query processing for DB2. In Proc. of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02), pp. 524-532. ACM, New York.
[108]
J. W. Josten, C. Mohan, I. Narang, and J. Z. Teng. 1997. DB2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2): 327-351.
[109]
S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), pp. 3363-3372. ACM, New York.
[110]
R. Katz. editor. June 1982. Special issue on design data management. IEEE Database Engineering Newsletter, 5(2).
[111]
J. Kepner, V. Gadepally, D. Hutchison, H. Jensen, T. Mattson, S. Samsi, and A. Reuther. 2016. Associative array model of SQL, NoSQL, and NewSQL Databases. IEEE High Performance Extreme Computing Conference (HPEC) 2016, Waltham, MA, September 13-15.
[112]
V. Kevin and M. Whitney. 1974. Relational data management implementation techniques. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '74), pp. 321-350. ACM, New York.
[113]
Z. Khayyat, I.F. Ilyas, A. Jindal, S. Madden, M. Ouzzani, P. Papotti, J.-A. Quiané-Ruiz, N. Tang, and S. Yin. 2015. Bigdansing: A system for big data cleansing. In Proc. of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15), pp. 1215-1230. ACM, New York.
[114]
R. Kimball and M. Ross. 2013. The Data Warehouse Toolkit. John Wiley & Sons, Inc. https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/. Last accessed March 2, 2018.
[115]
M. Kornacker, C. Mohan, and J.M. Hellerstein. 1997. Concurrency and recovery in generalized search trees. In Proc. of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD '97), pp. 62-72. ACM, New York.
[116]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. August 2012. The Vertica Analytic Database: C-Store 7 years later. Proc. VLDB Endowment, 5(12): 1790-1801.
[117]
L. Lamport. 2001. Paxos Made Simple. http://lamport.azurewebsites.net/pubs/paxos-simple.pdf. Last accessed December 31, 2017.
[118]
D. Laney. 2001. 3D data management: controlling data volume, variety and velocity. META Group Research, February 6. https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Last accessed April 22, 2018.
[119]
P-A. Larson, C. Clinciu, E.N. Hanson, A. Oks, S.L. Price, S. Rangarajan, A. Surna, and Q. Zhou. 2011. SQL server column store indexes. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11), pp. 1177-1184. ACM, New York.
[120]
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, and M. J. Carey. 2014. MISO: Souping up big data query processing with a multistore system. Proc. of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14), pp. 1591-1602. ACM, New York.
[121]
B. G. Lindsay. 1987. A retrospective of R*: a distributed database management system. In Proc. of the IEEE, 75(5): 668-673.
[122]
B. Liskov and S.N. Zilles. 1974. Programming with abstract data types. SIGPLAN Notices, 9(4): 50-59.
[123]
S. Marcin and A. Csillaghy. 2016. Running scientific algorithms as array database operators: Bringing the processing power to the data. 2016 IEEE International Conference on Big Data. pp. 3187-3193.
[124]
T. Mattson, V. Gadepally, Z. She, A. Dziedzic, and J. Parkhurst. 2017. Demonstrating the BigDAWG polystore system for ocean metagenomic analysis. CIDR'17 Chaminade, CA. http://cidrdb.org/cidr2017/papers/p120-mattson-cidr17.pdf.
[125]
J. Meehan, C. Aslantas, S. Zdonik, N. Tatbul, and J. Du. 2017. Data ingestion for the connected world. Conference on Innovative Data Systems Research (CIDR'17), Chaminade, CA, January.
[126]
A. Metaxides, W. B. Helgeson, R. E. Seth, G. C. Bryson, M. A. Coane, D. G. Dodd, C. P. Earnest, R. W. Engles, L. N. Harper, P. A. Hartley, D. J. Hopkin, J. D. Joyce, S. C. Knapp, J. R. Lucking, J. M. Muro, M. P. Persily, M. A. Ramm, J. F. Russell, R. F. Schubert, J. R. Sidlo, M. M. Smith, and G. T. Werner. April 1971. Data Base Task Group Report to the CODASYL Programming Language Committee. ACM, New York.
[127]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. 1992. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems, 17(1), 94-162.
[128]
R. Motwani, J. Widom, A. Arasu B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. 2003. Query processing, approximation, and resource management in a data stream management system. Proc. of the First Biennial Conference on Innovative Data Systems Research (CIDR), January.
[129]
A. Oloso, K-S Kuo, T. Clune, P. Brown, A. Poliakov, H. Yu. 2016. Implementing connected component labeling as a user defined operator for SciDB. Proc. of 2016 IEEE International Conference on Big Data (Big Data). Washington, DC.
[130]
M. A. Olson. 1993. The design and implementation of the inversion file system. USENIX Winter. http://www.usenix.org/conference/usenix-winter-1993-conference/presentation/design-and-implementation-inversion-file-syste. Last accessed January 22, 2018.
[131]
J. C. Ong. 1982. Implementation of abstract data types in the relational database system INGRES, Master of Science Report, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, September 1982.
[132]
A. Palmer. 2013. Culture matters: Facebook CIO talks about how well Vertica, Facebook people mesh. Koa Labs Blog, December 20. http://koablog.wordpress.com/2013/12/20/culture-matters-facebook-cio-talks-about-how-well-vertica-facebook-people-mesh. Last accessed March 14, 2018.
[133]
A. Palmer. 2015a. The simple truth: happy people, healthy company. Tamr Blog, March 23. http://www.tamr.com/the-simple-truth-happy-people-healthy-company/. Last accessed March 14, 2018.
[134]
A. Palmer. 2015b. Where the red book meets the unicorn, Xconomy, June 22. http://www.xconomy.com/boston/2015/06/22/where-the-red-book-meets-the-unicorn/ Last accessed March 14, 2018.
[135]
A. Pavlo and M. Aslett. September 2016. What's really new with NewSQL? ACM SIGMOD Record, 45(2): 45-55.
[136]
G. Press. 2016. Cleaning big data: most time-consuming, least enjoyable data science task, survey says. Forbes, May 23. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#79e14e326f63.
[137]
N. Prokoshyna, J. Szlichta, F. Chiang, R. J. Miller, and D. Srivastava. 2015. Combining quantitative and logical data cleaning. PVLDB, 9(4): 300-311.
[138]
E. Ryvkina, A. S. Maskey, M. Cherniack, and S. Zdonik. 2006. Revision processing in a stream processing engine: a high-level design. Proc. of the 22nd International Conference on Data Engineering (ICDE'06), pp. 141-. Atlanta, GA, April. IEEE Computer Society, Washington, DC.
[139]
C. Saracco and D. Haderle. 2013. The history and growth of IBM's DB2. IEEE Annals of the History of Computing, 35(2): 54-66.
[140]
N. Savage. May 2015. Forging relationships. Communications of the ACM, 58(6): 22-23.
[141]
M. C. Schatz and B. Langmead. 2013. The DNA data deluge. IEEE Spectrum Magazine. https://spectrum.ieee.org/biomedical/devices/the-dna-data-deluge.
[142]
Z. She, S. Ravishankar, and J. Duggan. 2016. BigDAWG polystore query optimization through semantic equivalences. High Performance Extreme Computing Conference (HPEC). IEEE, 2016.
[143]
SIGFIDET panel discussion. 1974. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control: Data Models: Data-Structure-Set Versus Relational (SIGFIDET '74), pp. 121-144. ACM, New York.
[144]
R. Snodgrass. December 1982. Monitoring distributed systems: a relational approach. Ph.D. Dissertation, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA.
[145]
A. Szalay. June 2008. The Sloan digital sky survey and beyond. ACM SIGMOD Record, 37(2): 61-66.
[146]
Tamr. 2017. Tamr awarded patent for enterprise-scale data unification system. Tamr blog. February 9 2017. https://www.tamr.com/tamr-awarded-patent-enterprise-scale-data-unification-system-2/. Last accessed January 24, 2018.
[147]
R. Tan, R. Chirkova, V. Gadepally, and T. Mattson. 2017. Enabling query processing across heterogeneous data models: A survey. IEEE Big Data Workshop: Methods to Manage Heterogeneous Big Data and Polystore Databases, Boston, MA.
[148]
N. Tatbul and S. Zdonik. 2006. Window-aware Load Shedding for Aggregation Queries over Data Streams. In Proc. of the 32nd International Conference on Very Large Databases (VLDB'06), Seoul, Korea.
[149]
N. Tatbul, U. Çetintemel, and S. Zdonik. 2007. "Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing." International Conference on Very Large Data Bases (VLDB'07), Vienna, Austria.
[150]
R. P. van de Riet. 1986. Expert database systems. In Future Generation Computer Systems, 2(3): 191-199,
[151]
M. Vartak, S. Rahman, S. Madden, A. Parameswaran, and N. Polyzotis. September 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. PVLDB, 8(13): 2182-2193.
[152]
B. Wallace. June 9, 1986. Data base tool links to remote sites. Network World. http://books.google.com/books?id=aBwEAAAAMBAJ&pg=PA49&lpg=PA49&dq=ingres+star&source=bl&ots=FSMIR4thMj&sig=S1fzaaOT5CHRq4cwbLFEQp4UYCs&hl=en&sa=X&ved=0ahUKEwjJ1J_NttvZAhUG82MKHco2CfAQ6AEIYzAP#v=onepage&q=ingres%20star&f=false. Last accessed March 14, 2018.
[153]
J. Wang and N. J. Tang. 2014. Towards dependable data repairing with fixing rules. In Proc. of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14), pp. 457-468. ACM, New York.
[154]
E. Wong and K. Youssefi. September 1976. Decomposition--a strategy for query processing. ACM Transactions on Database Systems, 1(3): 223-241.
[155]
E. Wu and S. Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. PVLDB, 6(8): 553-564.
[156]
Y. Xing, S. Zdonik, and J.-H. Hwang. April 2005. Dynamic load distribution in the Borealis Stream Processor. Proc. of the 21st International Conference on Data Engineering (ICDE'05), Tokyo, Japan.

Cited By

View all
  • (2024)Lauca: A Workload Duplicator for Benchmarking Transactional Database PerformanceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336011636:7(3180-3194)Online publication date: Jul-2024
  • (2024)Designing Cloud Servers for Lower Carbon2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00041(452-470)Online publication date: 29-Jun-2024
  • (2024)BushStore: Efficient B+Tree Group Indexing for LSM-Tree in Non-Volatile Memory2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00316(4127-4139)Online publication date: 13-May-2024
  • Show More Cited By
  1. OLTP through the looking glass, and what we found there

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Books
    Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker
    December 2018
    725 pages
    ISBN:9781947487192
    DOI:10.1145/3226595

    Publisher

    Association for Computing Machinery and Morgan & Claypool

    Publication History

    Published: 01 December 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DBMS architecture
    2. OLTP
    3. main memory transaction processing
    4. online transaction processing

    Qualifiers

    • Chapter

    Appears in

    ACM Books

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 10 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Lauca: A Workload Duplicator for Benchmarking Transactional Database PerformanceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336011636:7(3180-3194)Online publication date: Jul-2024
    • (2024)Designing Cloud Servers for Lower Carbon2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00041(452-470)Online publication date: 29-Jun-2024
    • (2024)BushStore: Efficient B+Tree Group Indexing for LSM-Tree in Non-Volatile Memory2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00316(4127-4139)Online publication date: 13-May-2024
    • (2023)Loom: A Closed-Box Disaggregated Database SystemProceedings of the 12th Latin-American Symposium on Dependable and Secure Computing10.1145/3615366.3615424(30-39)Online publication date: 16-Oct-2023
    • (2023)When Private Blockchain Meets Deterministic DatabaseProceedings of the ACM on Management of Data10.1145/35889521:1(1-28)Online publication date: 30-May-2023
    • (2023)D-Thespis: A Distributed Actor-Based Causally Consistent DBMSTransactions on Large-Scale Data- and Knowledge-Centered Systems LIII10.1007/978-3-662-66863-4_6(126-165)Online publication date: 9-Feb-2023
    • (2021)The end of Moore's law and the rise of the data processorProceedings of the VLDB Endowment10.14778/3476311.347637314:12(2932-2944)Online publication date: 28-Oct-2021
    • (2021)Main Memory Database RecoveryACM Computing Surveys10.1145/344219754:2(1-36)Online publication date: 5-Mar-2021
    • (2021)An RDBMS-only architecture for web applications2021 XLVII Latin American Computing Conference (CLEI)10.1109/CLEI53233.2021.9640017(1-9)Online publication date: 25-Oct-2021
    • (2020)KVell+Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488790(425-441)Online publication date: 4-Nov-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media