Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1693453.1693511acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

A distributed placement service for graph-structured and tree-structured data

Published: 09 January 2010 Publication History
  • Get Citation Alerts
  • Abstract

    Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (communication) costs, in enhancing reliability (via strategic replication policies), and in ensuring a balanced workload or enhancing the available bandwidth from massive storage devices (e.g. disk arrays).
    Existing work has largely targeted the placement of relatively simple data types or entities (e.g. elements, vectors, sets, and arrays). Here we investigate several hash-based distributed data placement methods targeting tree- and graph- structured data, and develop a locality enhancing placement service for large cluster systems. Target applications include the placement of a single large graph (e.g. Web graph), a single large tree (e.g. large XML file), a forest of graphs or trees (e.g. XML database) and other specialized graph data types - bi-partite (query-click graphs), directed acyclic graphs etc. We empirically evaluate our service by demonstrating its use in improving mining executions for pattern discovery, nearest neighbor searching, graph computations, and applications that combine link and content analysis.

    References

    [1]
    A. Broder et al. Min-wise independent permutations (extended abstract). In phSTOC, pages 327--336, 1998.
    [2]
    G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In phWSDM, pages 95--106, 2008.
    [3]
    G. Buehrer et al. Toward terabyte pattern mining: an architecture-conscious solution. In phPPOPP, pages 2--12, 2007.
    [4]
    P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In phSTOC, pages 604--613, 1998.
    [5]
    S. Parthasarathy et al. Parallel Data Mining for Association Rules on Shared-Memory Systems. In phKAIS, 3 (1): 1--29, 2001.
    [6]
    S. Tatikonda and S. Parthasarathy. Hashing Tree-Structured Data: Methods and Applications. phin ICDE (to appear), 2009.

    Cited By

    View all
    • (2014)Understanding data flow graph for improving big data stream computing environmentsInternational Journal of Computing Science and Mathematics10.1504/IJCSM.2014.0664465:4(394-404)Online publication date: 1-Dec-2014
    • (2017)BOSS: An Efficient Data Distribution Strategy for Object Storage Systems With Hybrid DevicesIEEE Access10.1109/ACCESS.2017.27442595(23979-23993)Online publication date: 2017

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    January 2010
    372 pages
    ISBN:9781605588773
    DOI:10.1145/1693453
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 45, Issue 5
      PPoPP '10
      May 2010
      346 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1837853
      Issue’s Table of Contents

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 January 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data placement
    2. distributed computing
    3. structured data

    Qualifiers

    • Poster

    Conference

    PPoPP '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Understanding data flow graph for improving big data stream computing environmentsInternational Journal of Computing Science and Mathematics10.1504/IJCSM.2014.0664465:4(394-404)Online publication date: 1-Dec-2014
    • (2017)BOSS: An Efficient Data Distribution Strategy for Object Storage Systems With Hybrid DevicesIEEE Access10.1109/ACCESS.2017.27442595(23979-23993)Online publication date: 2017

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media