: Search

Applied Filters

Publications

Conferences

Publication Date

Searched The ACM Guide to Computing Literature (3,766,563 records)|Limit your search to The ACM Full-Text Collection (759,377 records)

Showing 1 - 6of6 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

Article
April 2005
Robust Identification of Fuzzy Duplicates
ICDE '05: Proceedings of the 21st International Conference on Data EngineeringPages 865–876https://doi.org/10.1109/ICDE.2005.125

Detecting and eliminating fuzzy duplicates is a critical data cleaning task that is required by many applications. Fuzzy duplicates are multiple seemingly distinct tuples which represent the same real-world entity. We propose two novel criteria that ...
54
Metrics
Total Citations54
Article
June 2003
Robust and efficient fuzzy match for online data cleaning
SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of dataPages 313–324https://doi.org/10.1145/872757.872796

To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables. For example, product name and description fields in a ...
341
3,612
Metrics
Total Citations341
Total Downloads3,612
Last 12 Months71
Last 6 weeks11
Get Access
Article
April 2001
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data EngineeringPages 534–542
58
Metrics
Total Citations58
Article
Free
May 2000
Towards estimation error guarantees for distinct values
PODS '00: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsPages 268–279https://doi.org/10.1145/335168.335230

We consider the problem of estimating the number of distinct values in a column of a table. For large tables without an index on the column, random sampling appears to be the only scalable approach for estimating the number of distinct values. We ...
140
2,093
Metrics
Total Citations140
Total Downloads2,093
Last 12 Months418
Last 6 weeks65
View online with eReader
PDF
Article
Free
June 1999
On random sampling over joins
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataPages 263–274https://doi.org/10.1145/304182.304206

A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree ...
Also Published in:
ACM SIGMOD Record: Volume 28 Issue 2
224
2,285
Metrics
Total Citations224
Total Downloads2,285
Last 12 Months271
Last 6 weeks39
View online with eReader
PDF
Article
Free
June 1998
Random sampling for histogram construction: how much is enough?
SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of dataPages 436–447https://doi.org/10.1145/276304.276343

Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of determining “How much sampling is enough?” We address ...
Also Published in:
ACM SIGMOD Record: Volume 27 Issue 2
188
2,715
Metrics
Total Citations188
Total Downloads2,715
Last 12 Months485
Last 6 weeks38
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Robust Identification of Fuzzy Duplicates

Robust and efficient fuzzy match for online data cleaning

Overcoming Limitations of Sampling for Aggregation Queries

Towards estimation error guarantees for distinct values

On random sampling over joins

Also Published in:

Random sampling for histogram construction: how much is enough?

Also Published in: