Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3274895.3274942acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Efficient astronomical query processing using spark

Published: 06 November 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Sky surveys represent a fundamental data source in astronomy. Today, these surveys are moving into a petascale regime produced by modern telescopes. Due to the exponential growth of astronomical data, there is a pressing need to provide efficient astronomical query processing. Our goal is to bridge the gap between existing distributed systems and high-level languages for astronomers. In this paper, we present efficient techniques for query processing of astronomical data using ASTROIDE. Our framework helps astronomers to take advantage of the richness of the astronomical data. The proposed model supports complex astronomical operators expressed using ADQL (Astronomical Data Query Language), an extension of SQL commonly used by astronomers. ASTROIDE proposes spatial indexing and partitioning techniques to better filter the data access. It also implements a query optimizer that injects spatial-aware optimization rules and strategies. Experimental evaluation based on real datasets demonstrates that the present framework is scalable and efficient.

    References

    [1]
    2013. HEALPix Softaware. http://healpix.sourceforge.net/
    [2]
    2013. IGSL. http://cdsarc.u-strasbg.fr/viz-bin/Cat?I/324
    [3]
    2017. COST-BASED OPTIMIZER IN APACHE SPARK 2.2. https://spark-summit.org/2017/events/cost-based-optimizer-in-apache-spark-22/
    [4]
    2018. ADQL. http://www.ivoa.net/documents/latest/ADQL.html
    [5]
    2018. ADQL CDS. http://cdsportal.u-strasbg.fr/adqltuto/
    [6]
    2018. GAIA. http://www.cosmos.esa.int/web/gaia
    [7]
    2018. SciDB. https://www.paradigm4.com/try_scidb/
    [8]
    Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. 2015. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1383--1394.
    [9]
    Mariem Brahem, Stephane Lopes, Laurent Yeh, and Karine Zeitouni. 2016. Astro-Spark: towards a distributed data server for big data in astronomy. In Proceedings of the 3rd ACM SIGSPATIAL PhD Symposium. ACM, 3.
    [10]
    Mariem Brahem, Karine Zeitouni, and Laurent Yeh. 2017. HX-MATCH: In-Memory Cross-Matching Algorithm for Astronomical Big Data. In International Symposium on Spatial and Temporal Databases. Springer, 411--415.
    [11]
    Ahmed Eldawy and Mohamed F Mokbel. 2015. Spatialhadoop: A mapreduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 1352--1363.
    [12]
    Ahmed Eldawy and Mohamed F. Mokbel. 2017. The Era of Big Spatial Data. Proc. VLDB Endow. 10, 12 (2017), 1992--1995.
    [13]
    Krzysztof M Gorski, Eric Hivon, AJ Banday, Benjamin D Wandelt, Frode K Hansen, Mstvos Reinecke, and Matthia Bartelmann. 2005. HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal 622, 2 (2005), 759.
    [14]
    S Koposov and O Bartunov. 2006. Q3C, Quad Tree Cube-the new sky-indexing concept for huge astronomical catalogues and its realization for main astronomical queries (cone search and Xmatch) in open source database PostgreSQL. In Astronomical Data Analysis Software and Systems XV, Vol. 351. 735.
    [15]
    Amin Mesmoudi, Mohand-Saïd Hacid, and Farouk Toumani. 2016. Benchmarking SQL on MapReduce systems using large astronomy databases. Distributed and Parallel Databases 34, 3 (2016), 347--378.
    [16]
    María A Nieto-Santisteban, Aniruddha R Thakar, and Alexander S Szalay. 2007. Cross-matching very large datasets. In National Science and Technology Council (NSTC) NASA Conference.
    [17]
    Shoji Nishimura, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2013. MDHBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distributed and Parallel Databases 31, 2 (2013), 289--319.
    [18]
    François Ochsenbein, Patricia Bauer, and James Marcout. 2000. The VizieR database of astronomical catalogues. Astronomy and Astrophysics Supplement Series 143, 1 (2000), 23--32.
    [19]
    William O'Mullane, AJ Banday, KM Gorski, Peter Kunszt, and AS Szalay. 2000. Splitting the sky-HTM and HEALPix. In Mining the Sky. Springer, 638--648.
    [20]
    Alexander S Szalay, Jim Gray, George Fekete, Peter Z Kunszt, Peter Kukol, and Ani Thakar. 2007. Indexing the sphere with the hierarchical triangular mesh. arXiv preprint cs/0701164 (2007).
    [21]
    Jacob VanderPlas, Emad Soroush, K Simon Krughoff, Magdalena Balazinska, and Andrew Connolly. 2013. Squeezing a Big Orange into Little Boxes: The AscotDB System for Parallel Processing of Data on a Sphere. IEEE Data Eng. Bull. 36, 4 (2013), 11--20.
    [22]
    Chenyi Xia, Hongjun Lu, Beng Chin Ooi, and Jing Hu. 2004. Gorder: an efficient method for KNN join processing. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 756--767.
    [23]
    Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient in-memory spatial analytics. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1071--1085.
    [24]
    Jia Yu, Jinxuan Wu, and Mohamed Sarwat. 2015. Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 70.
    [25]
    Chi Zhang, Feifei Li, and Jeffrey Jestes. 2012. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 38--49.
    [26]
    Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui, Liqiang Lv, and Jian Xiao. 2009. A paralleled large-scale astronomical cross-matching function. In International Conference on Algorithms and Architectures for Parallel Processing. Springer, 604--614.

    Cited By

    View all
    • (2021)The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert BrokerThe Astronomical Journal10.3847/1538-3881/abe9bc161:5(242)Online publication date: 27-Apr-2021
    • (2021)Scaling pair count to next galaxy surveysMonthly Notices of the Royal Astronomical Society10.1093/mnras/stab3640510:2(3085-3097)Online publication date: 15-Dec-2021
    • (2020)Prospective Data Model and Distributed Query Processing for Mobile Sensing Data StreamsMultiple-Aspect Analysis of Semantic Trajectories10.1007/978-3-030-38081-6_6(66-82)Online publication date: 4-Jan-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGSPATIAL '18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
    November 2018
    655 pages
    ISBN:9781450358897
    DOI:10.1145/3274895
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. astronomical survey data management
    2. big data
    3. query processing
    4. spark framework

    Qualifiers

    • Research-article

    Funding Sources

    • European Union

    Conference

    SIGSPATIAL '18
    Sponsor:

    Acceptance Rates

    SIGSPATIAL '18 Paper Acceptance Rate 30 of 150 submissions, 20%;
    Overall Acceptance Rate 220 of 1,116 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert BrokerThe Astronomical Journal10.3847/1538-3881/abe9bc161:5(242)Online publication date: 27-Apr-2021
    • (2021)Scaling pair count to next galaxy surveysMonthly Notices of the Royal Astronomical Society10.1093/mnras/stab3640510:2(3085-3097)Online publication date: 15-Dec-2021
    • (2020)Prospective Data Model and Distributed Query Processing for Mobile Sensing Data StreamsMultiple-Aspect Analysis of Semantic Trajectories10.1007/978-3-030-38081-6_6(66-82)Online publication date: 4-Jan-2020
    • (2019)Analysing billion-objects catalogue interactively: Apache Spark for physicistsAstronomy and Computing10.1016/j.ascom.2019.100305(100305)Online publication date: Jul-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media