Integration of spatial join algorithms for processing multiple inputs
Several techniques that compute the join between two spatial datasets have been proposed during the last decade. Among these methods, some consider existing indices for the joined inputs, while others treat datasets with no index, providing solutions for ...
Selectivity estimation in spatial databases
Selectivity estimation of queries is an important and well-studied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, ...
Efficient concurrency control in multidimensional access methods
The importance of multidimensional index structures to numerous emerging database applications is well established. However, before these index structures can be supported as access methods (AMs) in a “commercial-strength” database management system (...
Snakes and sandwiches: optimal clustering strategies for a data warehouse
Physical layout of data is a crucial determinant of performance in a data warehouse. The optimal clustering of data on disk, for minimizing expected I/O, depends on the query workload. In practice, we often have a reasonable sense of the likelihood of ...
OPTICS: ordering points to identify the clustering structure
Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms ...
Fast algorithms for projected clustering
The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces ...
Logical logging to extend recovery to new domains
Recovery can be extended to new domains at reduced logging cost by exploiting “logical” log operations. During recovery, a logical log operation may read data values from any recoverable object, not solely from values on the log or from the updated ...
Efficient concurrency control for broadcast environments
A crucial consideration in environments where data is broadcast to clients is the low bandwidth available for clients to communicate with servers. Advanced applications in such environments do need to read data that is mutually consistent as well as ...
Update propagation protocols for replicated databates
Replication is often used in many distributed systems to provide a higher level of performance, reliability and availability. Lazy replica update protocols, which propagate updates to replicas through independent transactions after the original ...
Belief reasoning in MLS deductive databases
It is envisaged that the application of the multilevel security (MLS) scheme will enhance flexibility and effectiveness of authorization policies in shared enterprise databases and will replace cumbersome authorization enforcement practices through ...
A multimedia presentation algebra
Over the last few years, there has been a tremendous increase in the number of interactive multimedia presentations prepared by different individuals and organizations. In this paper, we present an algebra for querying multimedia presentation databases. ...
Querying network directories
Heirarchically structured directories have recently proliferated with the growth of the Internet, and are being used to store not only address books and contact information for people, but also personal profiles, network resource information, and network ...
Online association rule mining
We present a novel algorithm to compute large itemsets online. The user is free to change the support threshold any time during the first scan of the transaction sequence. The algorithm maintains a superset of all large itemsets and for each itemset a ...
Optimization of constrained frequent set queries with 2-variable constraints
Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of ...
BOAT—optimistic decision tree construction
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A ...
Self-tuning histograms: building histograms without looking at data
In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine ...
Approximate computation of multidimensional aggregates of sparse data using wavelets
Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse ...
Multi-dimensional selectivity estimation using compressed histogram information
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when ...
An efficient bitmap encoding scheme for selection queries
Bitmap indexes are useful in processing complex queries in decision support systems, and they have been implemented in several commercial database systems. A key design parameter for bitmap indexes is the encoding scheme, which determines the bits that ...
Query optimization for selections using bitmaps
Bitmaps are popular indexes for data warehouse (DW) applications and most database management systems offer them today. This paper proposes query optimization strategies for selections using bitmaps. Both continuous and discrete selection criteria are ...
A comparison of selectivity estimators for range queries on metric attributes
In this paper, we present a comparison of nonparametric estimation methods for computing approximations of the selectivities of queries, in particular range queries. In contrast to previous studies, the focus of our comparison is on metric attributes ...
Random sampling techniques for space efficient online computation of order statistics of large datasets
In a recent paper [MRL98], we had described a general framework for single pass approximate quantile finding algorithms. This framework included several known algorithms as special cases. We had identified a new algorithm, within the framework, which had ...
On random sampling over joins
A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree ...
Join synopses for approximate query answering
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate ...
Ripple joins for online aggregation
We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
An adaptive query execution system for data integration
Query processing in data integration occurs over network-bound, autonomous data sources. This requires extensions to traditional optimization and execution techniques for three reasons: there is an absence of quality statistics about the data, data ...
Query optimization in the presence of limited access patterns
We consider the problem of query optimization in the presence of limitations on access patterns to the data (i.e., when one must provide values for one of the attributes of a relation in order to obtain tuples). We show that in the presence of limited ...
Query processing techniques for arrays
Arrays are an appropriate data model for images, gridded output from computational models, and other types of data. This paper describes an approach to array query processing. Queries are expressed in AML, a logical algebra that is easily extended with ...
Mind your vocabulary: query mapping across heterogeneous information sources
In this paper we present a mechanism for translating constraint queries, i.e., Boolean expressions of constraints, across heterogeneous information sources. Integrating such systems is difficult in part because they use a wide range of constraints as the ...
Client-site query extensions
We explore the execution of queries with client-site user-defined functions (UDFs). Many UDFs can only be executed at the client site, for reasons of scalability, security, confidentiality, or availability of resources. How should a query with client-...