Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/773153.773167acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

On producing join results early

Published: 09 June 2003 Publication History
  • Get Citation Alerts
  • Abstract

    Support for exploratory interaction with databases in applications such as data mining requires that the first few results of an operation be available as quickly as possible. We study the algorithmic side of what can and what cannot be achieved for processing join operations. We develop strategies that modify the strict two-phase processing of the sort-merge paradigm, intermingling join steps with selected merge phases of the sort. We propose an algorithm that produces early join results for a broad class of join problems, including many not addressed well by hash-based algorithms. Our algorithm has no significant increase in the number of I/O operations needed to complete the join compared to standard sort-merge algorithms.

    References

    [1]
    L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable Sweeping-Based Spatial Join. In International Conference on Very Large Data Bases, pages 570--581, 1998.
    [2]
    M. W. Blasgen and K. P. Eswaran. Storage and access in relational data bases. IBM Systems Journal, 16(4):362--377, 1977.
    [3]
    C. Böhm, B. Braunmüller, F. Krebs, and H.-P. Kriegel. Epsilon Grid Order: An algorithm for the similarity join on massive high-dimensional data. In ACM SIGMOD International Conference on Management of Data, pages 379--388, 2001.
    [4]
    S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In ACM SIGMOD International Conference on Management of Data, pages 263--274, 1999.
    [5]
    J.-P. Dittrich and B. Seeger. Data redundancy and duplicate detection in spatial join processing. In International Conference on Data Engineering, pages 535--546, 2000.
    [6]
    J.-P. Dittrich and B. Seeger. GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces. In ACM SIGKDD International Converence on Knowledge Discover and Data Mining, pages 47--56, 2001.
    [7]
    J.-P. Dittrich, B. Seeger, D. S. Taylor, and P. Widmayer. Progressive Merge Join: A generic and non-blocking sort-based join algorithm. In International Conference on Very Large Data Bases, pages 299--310, 2002.
    [8]
    G. Graefe. Heap-Filter Merge Join: A new algorithm for joining medium-size inputs. IEEE Transactions on Software Engineering, 17(9):979--982, 1991.
    [9]
    G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, 1993.
    [10]
    G. Graefe. Sort-Merge-Join: An idea whose time has(h) passed? In International Conference on Data Engineering, pages 406--417, 1994.
    [11]
    P. J. Haas and J. M. Hellerstein. Ripple Joins for online aggregation. In ACM SIGMOD International Conference on Management of Data, pages 287--298, 1999.
    [12]
    Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In ACM SIGMOD International Conference on Management of Data, pages 299--310, 1999.
    [13]
    D. Knuth. The Art of Computer Programming, Volume III: Searching and Sorting. Addison Wesley, second edition, 1998.
    [14]
    R. E. Korf. Depth-First Iterative-Deepening: An optimal admissible tree search. Artificial Intelligence, 27(1):35--77, 1985.
    [15]
    R. A. Kyuseok Shim, Ramakrishnan Srikant. High-dimensional similarity joins. In International Conference on Data Engineering, pages 301--313, 1997.
    [16]
    W. Li, D. Gao, and R. T. Snodgrass. Skew handling techniques in sort-merge join. In ACM SIGMOD International Conference on Management of Data, pages 169--180, 2002.
    [17]
    G. Luo, J. F. Naughton, and C. Ellmann. A non-blocking parallel spatial join algorithm. In International Conference on Data Engineering, pages 697--705, 2002.
    [18]
    M. Negri and G. Pelagatti. Join During Merge: An improved sort based algorithm. Information Processing Letters, 21(1):11--16, 1985.
    [19]
    J. A. Orenstein. Spatial query processing in an object-oriented database system. In ACM SIGMOD International Conference on Management of Data, pages 326--336, 1986.
    [20]
    J. A. Orenstein. An algorithm for computing the overlay of k--dimensional spaces. In International Symposium on Advances in Spacial Databases, pages 381--400, 1991.
    [21]
    J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In ACM SIGMOD International Conference on Management of Data, pages 259--270, 1996.
    [22]
    L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In International Conference On Very Large Data Bases, pages 412--419, 1986.
    [23]
    P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In ACM SIGMOD International Conference on Management of Data, pages 23--34, 1979.
    [24]
    T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. Data Engineering Bulletin, 23(2):27--33, 2000.
    [25]
    A. N. Wilschut and P. M. G. Apers. Pipelining in query execution. In Conference on Databases, Parallel Architectures and their Applications, pages 68--77, 1991.

    Cited By

    View all
    • (2023)Efficient Sorting, Duplicate Removal, Grouping, and AggregationACM Transactions on Database Systems10.1145/356802747:4(1-35)Online publication date: 6-Jan-2023
    • (2011)On Producing High and Early Result Throughput in Multijoin Query PlansIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.18223:12(1888-1902)Online publication date: 1-Dec-2011
    • (2008)Scalable approximate query processing with the DBO engineACM Transactions on Database Systems10.1145/1412331.141233533:4(1-54)Online publication date: 12-Dec-2008
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2003
    291 pages
    ISBN:1581136706
    DOI:10.1145/773153
    • Conference Chair:
    • Frank Neven,
    • General Chair:
    • Catriel Beeri,
    • Program Chair:
    • Tova Milo
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data mining
    2. join processing
    3. non-blocking
    4. query processing
    5. spatial data

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS03

    Acceptance Rates

    PODS '03 Paper Acceptance Rate 27 of 136 submissions, 20%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Efficient Sorting, Duplicate Removal, Grouping, and AggregationACM Transactions on Database Systems10.1145/356802747:4(1-35)Online publication date: 6-Jan-2023
    • (2011)On Producing High and Early Result Throughput in Multijoin Query PlansIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.18223:12(1888-1902)Online publication date: 1-Dec-2011
    • (2008)Scalable approximate query processing with the DBO engineACM Transactions on Database Systems10.1145/1412331.141233533:4(1-54)Online publication date: 12-Dec-2008
    • (2008)Compact Similarity JoinsProceedings of the 2008 IEEE 24th International Conference on Data Engineering10.1109/ICDE.2008.4497443(346-355)Online publication date: 7-Apr-2008
    • (2008)Fine-Grained Progressive Algorithm Based on HMJProceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 0410.1109/CSSE.2008.1008(589-592)Online publication date: 12-Dec-2008
    • (2007)RRPJProceedings of the 12th international conference on Database systems for advanced applications10.5555/1783823.1783832(43-54)Online publication date: 9-Apr-2007
    • (2007)Scalable approximate query processing with the DBO engineProceedings of the 2007 ACM SIGMOD international conference on Management of data10.1145/1247480.1247560(725-736)Online publication date: 11-Jun-2007
    • (2007)The effect of reading policy on early join result productionInformation Sciences: an International Journal10.1016/j.ins.2007.02.042177:19(3939-3956)Online publication date: 1-Oct-2007
    • (2007)RRPJ: Result-Rate Based Progressive Relational JoinAdvances in Databases: Concepts, Systems and Applications10.1007/978-3-540-71703-4_6(43-54)Online publication date: 2007
    • (2006)The Sort-Merge-Shrink joinACM Transactions on Database Systems10.1145/1189769.118977531:4(1382-1416)Online publication date: 1-Dec-2006
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media