Article

On producing join results early

Authors:

Jens-Peter Dittrich,

Bernhard Seeger,

David Scot Taylor,

Peter WidmayerAuthors Info & Claims

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 134 - 142

https://doi.org/10.1145/773153.773167

Published: 09 June 2003 Publication History

Abstract

Support for exploratory interaction with databases in applications such as data mining requires that the first few results of an operation be available as quickly as possible. We study the algorithmic side of what can and what cannot be achieved for processing join operations. We develop strategies that modify the strict two-phase processing of the sort-merge paradigm, intermingling join steps with selected merge phases of the sort. We propose an algorithm that produces early join results for a broad class of join problems, including many not addressed well by hash-based algorithms. Our algorithm has no significant increase in the number of I/O operations needed to complete the join compared to standard sort-merge algorithms.

References

[1]

L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweeping-Based Spatial Join. In International Conference on Very Large Data Bases, pages 570--581, 1998.

Digital Library

[2]

M. W. Blasgen and K. P. Eswaran. Storage and access in relational data bases. IBM Systems Journal, 16(4):362--377, 1977.

Digital Library

[3]

C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon Grid Order: An algorithm for the similarity join on massive high-dimensional data. In ACM SIGMOD International Conference on Management of Data, pages 379--388, 2001.

Digital Library

[4]

S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In ACM SIGMOD International Conference on Management of Data, pages 263--274, 1999.

Digital Library

[5]

J.-P. Dittrich and B. Seeger. Data redundancy and duplicate detection in spatial join processing. In International Conference on Data Engineering, pages 535--546, 2000.

Digital Library

[6]

J.-P. Dittrich and B. Seeger. GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces. In ACM SIGKDD International Converence on Knowledge Discover and Data Mining, pages 47--56, 2001.

Digital Library

[7]

J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive Merge Join: A generic and non-blocking sort-based join algorithm. In International Conference on Very Large Data Bases, pages 299--310, 2002.

Digital Library

[8]

G. Graefe. Heap-Filter Merge Join: A new algorithm for joining medium-size inputs. IEEE Transactions on Software Engineering, 17(9):979--982, 1991.

Digital Library

[9]

G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, 1993.

Digital Library

[10]

G. Graefe. Sort-Merge-Join: An idea whose time has(h) passed? In International Conference on Data Engineering, pages 406--417, 1994.

Digital Library

[11]

P. J. Haas and J. M. Hellerstein. Ripple Joins for online aggregation. In ACM SIGMOD International Conference on Management of Data, pages 287--298, 1999.

Digital Library

[12]

Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In ACM SIGMOD International Conference on Management of Data, pages 299--310, 1999.

Digital Library

[13]

D. Knuth. The Art of Computer Programming, Volume III: Searching and Sorting. Addison Wesley, second edition, 1998.

Digital Library

[14]

R. E. Korf. Depth-First Iterative-Deepening: An optimal admissible tree search. Artificial Intelligence, 27(1):35--77, 1985.

Digital Library

[15]

R. A. Kyuseok Shim, Ramakrishnan Srikant. High-dimensional similarity joins. In International Conference on Data Engineering, pages 301--313, 1997.

Digital Library

[16]

W. Li, D. Gao, and R. T. Snodgrass. Skew handling techniques in sort-merge join. In ACM SIGMOD International Conference on Management of Data, pages 169--180, 2002.

Digital Library

[17]

G. Luo, J. F. Naughton, and C. Ellmann. A non-blocking parallel spatial join algorithm. In International Conference on Data Engineering, pages 697--705, 2002.

Digital Library

[18]

M. Negri and G. Pelagatti. Join During Merge: An improved sort based algorithm. Information Processing Letters, 21(1):11--16, 1985.

[19]

J. A. Orenstein. Spatial query processing in an object-oriented database system. In ACM SIGMOD International Conference on Management of Data, pages 326--336, 1986.

Digital Library

[20]

J. A. Orenstein. An algorithm for computing the overlay of k--dimensional spaces. In International Symposium on Advances in Spacial Databases, pages 381--400, 1991.

Digital Library

[21]

J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In ACM SIGMOD International Conference on Management of Data, pages 259--270, 1996.

Digital Library

[22]

L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In International Conference On Very Large Data Bases, pages 412--419, 1986.

Digital Library

[23]

P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In ACM SIGMOD International Conference on Management of Data, pages 23--34, 1979.

Digital Library

[24]

T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. Data Engineering Bulletin, 23(2):27--33, 2000.

[25]

A. N. Wilschut and P. M. G. Apers. Pipelining in query execution. In Conference on Databases, Parallel Architectures and their Applications, pages 68--77, 1991.

Cited By

Do TGraefe GNaughton J(2023)Efficient Sorting, Duplicate Removal, Grouping, and AggregationACM Transactions on Database Systems10.1145/356802747:4(1-35)Online publication date: 6-Jan-2023
https://dl.acm.org/doi/10.1145/3568027
Levandoski JKhalefa MMokbel M(2011)On Producing High and Early Result Throughput in Multijoin Query PlansIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.18223:12(1888-1902)Online publication date: 1-Dec-2011
https://dl.acm.org/doi/10.1109/TKDE.2010.182
Jermaine CArumugam SPol ADobra A(2008)Scalable approximate query processing with the DBO engineACM Transactions on Database Systems10.1145/1412331.141233533:4(1-54)Online publication date: 12-Dec-2008
https://dl.acm.org/doi/10.1145/1412331.1412335
Show More Cited By

Index Terms

On producing join results early
1. Information systems
  1. Information retrieval
2. Theory of computation
  1. Computational complexity and cryptography
    1. Complexity classes

Recommendations

Multi-way spatial join selectivity for the ring join graph

Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...
Combining Joint and Semi-Join Operations for Distributed Query Processing

The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Interleaving a Join Sequence with Semijoins in Distributed Query Processing

The problem of combining join and semijoin reducers for distributed query processing is studied. An approach based on interleaving a join sequence with beneficial semijoins is proposed. A join sequence is mapped into a join sequence tree first. The join ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

June 2003

291 pages

ISBN:1581136706

DOI:10.1145/773153

Conference Chair:
Frank Neven
Limburgs Universitair Centrum
,
General Chair:
Catriel Beeri
Hebrew University of Jerusalem
,
Program Chair:
Tova Milo
Tel Aviv University & INRIA

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMOD/PODS03

Sponsor:

SIGMOD/PODS03: International Conference on Management of Data and Symposium on Principles Database and Systems

June 9 - 11, 2003

California, San Diego

Acceptance Rates

PODS '03 Paper Acceptance Rate 27 of 136 submissions, 20%;

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
635
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Do TGraefe GNaughton J(2023)Efficient Sorting, Duplicate Removal, Grouping, and AggregationACM Transactions on Database Systems10.1145/356802747:4(1-35)Online publication date: 6-Jan-2023
https://dl.acm.org/doi/10.1145/3568027
Levandoski JKhalefa MMokbel M(2011)On Producing High and Early Result Throughput in Multijoin Query PlansIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.18223:12(1888-1902)Online publication date: 1-Dec-2011
https://dl.acm.org/doi/10.1109/TKDE.2010.182
Jermaine CArumugam SPol ADobra A(2008)Scalable approximate query processing with the DBO engineACM Transactions on Database Systems10.1145/1412331.141233533:4(1-54)Online publication date: 12-Dec-2008
https://dl.acm.org/doi/10.1145/1412331.1412335
Bryan BEberhardt FFaloutsos C(2008)Compact Similarity JoinsProceedings of the 2008 IEEE 24th International Conference on Data Engineering10.1109/ICDE.2008.4497443(346-355)Online publication date: 7-Apr-2008
https://dl.acm.org/doi/10.1109/ICDE.2008.4497443
Chen GLi GYang BTang XChen H(2008)Fine-Grained Progressive Algorithm Based on HMJProceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 0410.1109/CSSE.2008.1008(589-592)Online publication date: 12-Dec-2008
https://dl.acm.org/doi/10.1109/CSSE.2008.1008
Tok WBressan SLee M(2007)RRPJProceedings of the 12th international conference on Database systems for advanced applications10.5555/1783823.1783832(43-54)Online publication date: 9-Apr-2007
https://dl.acm.org/doi/10.5555/1783823.1783832
Jermaine CArumugam SPol ADobra AZhou LLing TOoi B(2007)Scalable approximate query processing with the DBO engineProceedings of the 2007 ACM SIGMOD international conference on Management of data10.1145/1247480.1247560(725-736)Online publication date: 11-Jun-2007
https://dl.acm.org/doi/10.1145/1247480.1247560
Lawrence RRusso RShyamalkumar N(2007)The effect of reading policy on early join result productionInformation Sciences: an International Journal10.1016/j.ins.2007.02.042177:19(3939-3956)Online publication date: 1-Oct-2007
https://dl.acm.org/doi/10.1016/j.ins.2007.02.042
Tok WBressan SLee M(2007)RRPJ: Result-Rate Based Progressive Relational JoinAdvances in Databases: Concepts, Systems and Applications10.1007/978-3-540-71703-4_6(43-54)Online publication date: 2007
https://doi.org/10.1007/978-3-540-71703-4_6
Jermaine CDobra AArumugam SJoshi SPol A(2006)The Sort-Merge-Shrink joinACM Transactions on Database Systems10.1145/1189769.118977531:4(1382-1416)Online publication date: 1-Dec-2006
https://dl.acm.org/doi/10.1145/1189769.1189775
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents