Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- panelJune 2023
Future of Database System Architectures
- Gustavo Alonso,
- Natassa Ailamaki,
- Sailesh Krishnamurthy,
- Sam Madden,
- Swami Sivasubramanian,
- Raghu Ramakrishnan
SIGMOD '23: Companion of the 2023 International Conference on Management of DataPages 261–262https://doi.org/10.1145/3555041.3589360Over the past two decades, we have experienced major technology disruptions on multiple fronts, none bigger than the emergence of cloud computing, which has led to fundamental changes in how database software is architected. We are seeing several new ...
- research-articleJune 2016
Breaking BAD: a data serving vision for big active data
DEBS '16: Proceedings of the 10th ACM International Conference on Distributed and Event-based SystemsPages 181–186https://doi.org/10.1145/2933267.2933313Virtually all of today's Big Data systems are passive in nature. Here we describe a project to shift Big Data platforms from passive to active. We detail a vision for a scalable system that can continuously and reliably capture Big Data to enable timely ...
- articleJanuary 2016
Horizontal partitioning method for test verification in parallel database systems
International Journal of Advanced Intelligence Paradigms (IJAIP), Volume 9, Issue 1Pages 96–106https://doi.org/10.1504/IJAIP.2017.081182In parallel database systems the partitioning methods considered in current researches are static. This research paper presents a partitioning method to divide the database relations into dynamic horizontal partitions. Every partition contains some ...
- research-articleMay 2015
Thrifty: Offering Parallel Database as a Service using the Shared-Process Approach
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataPages 1063–1068https://doi.org/10.1145/2723372.2735352Recently, Amazon has announced Redshift, a Parallel-Database-as-a Service (PDaaS). Redshift adopts the "virtual cluster" approach to implement multitenancy, which has the merit of hard isolation among tenants (i.e., tenants do not interfere even when ...
- ArticleMarch 2014
Implementing the Palomar Transient Factory Real-Time Detection Pipeline in GLADE: Results and Observations
DNIS 2014: Proceedings of the 9th International Workshop on Databases in Networked Information Systems - Volume 8381Pages 53–66https://doi.org/10.1007/978-3-319-05693-7_4Palomar Transient Factory is a comprehensive detection system for the identification and classification of transient astrophysical objects. The central piece in the identification pipeline is represented by an automated classifier that distinguishes ...
-
- ArticleJuly 2013
Sampling estimators for parallel online aggregation
BNCOD'13: Proceedings of the 29th British National conference on Big DataPages 204–217https://doi.org/10.1007/978-3-642-39467-6_19Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. When coupled with parallel ...
- research-articleJune 2013
Parallel analytics as a service
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataPages 25–36https://doi.org/10.1145/2463676.2463714Recently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become ...
- demonstrationJune 2013
Iterative parallel data processing with stratosphere: an inside look
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataPages 1053–1056https://doi.org/10.1145/2463676.2463693Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis. With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively ...
- research-articleMay 2013
SciDB: A Database Management System for Applications with Complex Analytics
Computing in Science and Engineering (IEEECS_CISE-NEW), Volume 15, Issue 3Pages 54–62https://doi.org/10.1109/MCSE.2013.19A description and discussion of the SciDB database management system focuses on lessons learned, application areas, performance comparisons against other solutions, and additional approaches to managing data and complex analytics.
- articleJanuary 2013
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience (CCOMP), Volume 25, Issue 1Pages 2–23https://doi.org/10.1002/cpe.2827Cogset is a generic and efficient engine for reliable storage and parallel processing of distributed data sets. It supports a number of high-level programming interfaces, including a MapReduce interface compatible with Hadoop. In this paper, we present ...
- research-articleOctober 2012
Parallel pipelined filter ordering with precedence constraints
ACM Transactions on Algorithms (TALG), Volume 8, Issue 4Article No.: 41, Pages 1–38https://doi.org/10.1145/2344422.2344431In the parallel pipelined filter ordering problem, we are given a set of n filters that run in parallel. The filters need to be applied to a stream of elements, to determine which elements pass all filters. Each filter has a rate limit ri on the number ...
- ArticleMarch 2012
High Performance Database Processing
AINA '12: Proceedings of the 2012 IEEE 26th International Conference on Advanced Information Networking and ApplicationsPages 5–6https://doi.org/10.1109/AINA.2012.140The sizes of databases have seen exponential growth in the past, and such growth is expected to accelerate in the future, with the steady drop in storage cost accompanied by a rapid increase in storage capacity. Many years ago, a terabyte database was ...
- research-articleAugust 2011
Massively parallel in-database predictions using PMML
PMML '11: Proceedings of the 2011 workshop on Predictive markup language modelingPages 22–27https://doi.org/10.1145/2023598.2023601Like all open standards, the Predictive Model Markup Language (PMML) enables interoperability and portability in the world of data mining and predictive analytics. This means that models developed in any environment and tool set can be deployed and used ...
- research-articleJune 2011
ArrayStore: a storage manager for complex parallel array processing
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataPages 253–264https://doi.org/10.1145/1989323.1989351We present the design, implementation, and evaluation of ArrayStore, a new storage manager for complex, parallel array processing. ArrayStore builds on prior work in the area of multidimensional data storage, but considers the new problem of supporting ...
- ArticleMay 2011
A Hybrid Shared-Nothing/Shared-Data Storage Scheme for Large-Scale Data Processing
ISPA '11: Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with ApplicationsPages 161–166https://doi.org/10.1109/ISPA.2011.43Shared-nothing and shared-disk are the two most common storage architectures of parallel databases in the past two decades. Both two types of systems have their own merits for different applications. However, there are no much efforts in investigating ...
- ArticleJune 2010
Tradeoffs between parallel database systems, Hadoop, and HadoopDB as platforms for petabyte-scale analysis
As the market demand for analyzing data sets of increasing variety and scale continues to explode, the software options for performing this analysis are beginning to proliferate. No fewer than a dozen companies have launched in the past few years that ...
- research-articleNovember 2009
Toward visual analysis of ensemble data sets
UltraVis '09: Proceedings of the 2009 Workshop on Ultrascale VisualizationPages 48–53https://doi.org/10.1145/1838544.1838551The rapid and continuing increase in available high-performance computing resources has driven simulation-based science in two directions. First, the simulations themselves are growing more complex, whether in the fidelity of the models, spatiotemporal ...
- research-articleJune 2009
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataPages 45–58https://doi.org/10.1145/1559845.1559853The state of the art commercial query optimizers employ cost-based optimization and exploit dynamic programming (DP) to find the optimal query execution plan (QEP) without evaluating redundant sub-plans. The number of alternative QEPs enumerated by the ...
- ArticleDecember 2008
Efficient, Chunk-Replicated Node Partitioned Data Warehouses
ISPA '08: Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with ApplicationsPages 578–583https://doi.org/10.1109/ISPA.2008.86Much has been said about processing efficiently data in parallel database servers, and some data warehouse applications must process in the order of tens to hundreds of Gigabytes efficiently. Yet, there is no effective approach targeted at using non-...
- ArticleJune 2007
Progressive optimization in a shared-nothing parallel database
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataPages 809–820https://doi.org/10.1145/1247480.1247569Commercial enterprise data warehouses are typically implemented on parallel databases due to the inherent scalability and performance limitation of a serial architecture. Queries used in such large data warehouses can contain complex predicates as well ...