Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleAugust 2004
Discovering and ranking semantic associations over a Large RDF metabase
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1317–1320Information retrieval over semantic metadata has recently received a great amount of interest in both industry and academia. In particular, discovering complex and meaningful relationships among this data is becoming an active research topic. Just as ...
- ArticleAugust 2004
Hos-Miner: a system for detecting outlyting subspaces of high-dimensional data
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1265–1268We identify a new and interesting high-dimensional outlier detection problem in this paper, that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In ...
- ArticleAugust 2004
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1257–1260We demonstrate StreamMiner, a random decision-tree ensemble based engine to mine data streams. A fundamental challenge in data stream mining applications (e.g., credit card transaction authorization, security buy-sell transaction, and phone call records,...
- ArticleAugust 2004
GPX: interactive mining of gene expression data
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1249–1252Discovering co-expressed genes and coherent expression patterns in gene expression data is an important data analysis task in bioinformatics research and biomedical applications. Although various clustering methods have been proposed, two tough ...
-
- ArticleAugust 2004
Automated design of multidimensional clustering tables for relational databases
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1170–1181The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for ...
- ArticleAugust 2004
Flexible string matching against large databases in practice
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1078–1086Data Cleaning is an important process that has been at the center of research interest in recent years. Poor data quality is the result of a variety of reasons, including data entry errors and multiple conventions for recording database fields, and has ...
- ArticleAugust 2004
BioPatentMiner: an information retrieval system for biomedical patents
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 1066–1077Before undertaking new biomedical research, identifying concepts that have already been patented is essential. Traditional keyword based search on patent databases may not be sufficient to retrieve all the relevant information, especially for the ...
- ArticleAugust 2004
Efficient indexing methods for probabilistic threshold queries over uncertain data
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 876–887It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous ...
- ArticleAugust 2004
A framework for projected clustering of high dimensional data streams
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 852–863The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan,...
- ArticleAugust 2004
Indexing large human-motion databases
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 780–791Data-driven animation has become the industry standard for computer games and many animated movies and special effects. In particular, motion capture data recorded from live actors, is the most promising approach offered thus far for animating realistic ...
- ArticleAugust 2004
Gorder: an efficient method for KNN join processing
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 756–767An important but very expensive primitive operation of high-dimensional databases is the K-Nearest Neighbor (KNN) similarity join. The operation combines each point of one dataset with its KNNs in the other dataset and it provides more meaningful query ...
- ArticleAugust 2004
Steps towards cache-resident transaction processing
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 660–671Online transaction processing (OLTP) is a multibillion dollar industry with high-end database servers employing state-of-the-art processors to maximize performance. Unfortunately, recent studies show that CPUs are far from realizing their maximum ...
- ArticleAugust 2004
Model-driven data acquisition in sensor networks
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 588–599Declarative queries are proving to be an attractive paradigm for ineracting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real ...
- ArticleAugust 2004
Linear road: a stream data management benchmark
- Arvind Arasu,
- Mitch Cherniack,
- Eduardo Galvez,
- David Maier,
- Anurag S. Maskey,
- Esther Ryvkina,
- Michael Stonebraker,
- Richard Tibbetts
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 480–491This paper specifies the Linear Road Benchmark for Stream Data Management Systems (SDMS). Stream Data Management Systems process streaming data by executing continuous and historical queries while producing query results in real-time. This benchmark ...
- ArticleAugust 2004
Similarity search for web services
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 372–383Web services are loosely coupled software components, published, located, and invoked across the web. The growing number of web services available within an organization and on the Web raises a new and challenging search problem: locating desired web ...
- ArticleAugust 2004
Stochastic consistency, and scalable pull-based caching for erratic data stream sources
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 192–203We introduce the notion of stochastic consistency, and propose a novel approach to achieving it for caches of highly erratic data. Erratic data sources, such as stock prices, sensor data, are common and important in practice. However, their erratic ...
- ArticleAugust 2004
Detecting change in data streams
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 180–191Detecting changes in a data stream is an important area of research with many applications. In this paper, we present a novel method for the detection and estimation of change. In addition to providing statistical guarantees on the reliability of ...
- ArticleAugust 2004
Answering xpath queries over networks by sending minimal views
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Pages 48–59When a client submits a set of XPath queries to a XML database on a network, the set of answer sets sent back by the database may include redundancy in two ways: some elements may appear in more than one answer set, and some elements in some answer sets ...