Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Fusion insight librA: huawei's enterprise cloud data analytics platform

Published: 01 August 2018 Publication History

Abstract

Huawei Fusion Insight Libr A (FI-MPPDB) is a petabyte scale enterprise analytics platform developed by the Huawei data-base group. It started as a prototype more than five years ago, and is now being used by many enterprise customers over the globe, including some of the world's largest financial institutions. Our product direction and enhancements have been mainly driven by customer requirements in the fast evolving Chinese market.
This paper describes the architecture of FI-MPPDB and some of its major enhancements. In particular, we focus on top four requirements from our customers related to data analytics on the cloud: system availability, auto tuning, query over heterogeneous data models on the cloud, and the ability to utilize powerful modern hardware for good performance. We present our latest advancements in the above areas including online expansion, auto tuning in query optimizer, SQL on HDFS, and intelligent JIT compiled execution. Finally, we present some experimental results to demonstrate the effectiveness of these technologies.

References

[1]
Amazon Athena - Amazon. https://aws.amazon.com/athena/.
[2]
Amazon Redshift - Amazon. https://aws.amazon.com/redshift/.
[3]
Amazon Redshift Spectrum - Amazon. https://aws.amazon.com/redshift/spectrum/.
[4]
Apache - HBase. https://hbase.apache.org/.
[5]
Apache - Hive. https://hive.apache.org/.
[6]
Apache - ORC. https://orc.apache.org/.
[7]
Apache - Parquet. https://parquet.apache.org/.
[8]
Apache HAWQ - Apache. http://spark.apache.org/sql/.
[9]
FusionInsight MPPDB - Huawei. http://e.huawei.com/us/products/cloud-computing-dc/cloud-computing/bigdata/fusioninsight.
[10]
Greenplum Database - Pivotal. http://www.greenplum.com.
[11]
Hadoop - Apache. http://hadoop.apache.org/.
[12]
MemSQL. https://www.memsql.com/.
[13]
Oracle Autonomous Database Strategy White Paper. http://www.oracle.com/us/products/database/autonomous-database-strategy-wp-4124741.pdf.
[14]
Postgres-XC. https://sourceforge.net/projects/postgres-xc/.
[15]
Snowflake - Snowflake. https://www.snowflake.net/.
[16]
Spark SQL & DataFrames - Apache. http://spark.apache.org/sql/.
[17]
Transwarp Inceptor. http://www.transwarp.cn/.
[18]
S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. Database Tuning Advisor for Microsoft SQL Server 2005. In Proceedings of the 30th VLDB Conference, Toronto, Canada, pages 1110--1121, 2004.
[19]
D. V. Aken, A. Pavlo, G. J. Gordon, and B. Zhang. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of SIGMOD 2017, Chicago, USA, pages 1009--1024, 2017.
[20]
J. Cohen, J. Eshleman, B. Hagenbuch, J. Ken, C. Pedrotti, G. Sherry, and F. Waas. Online Expansion of Large-scale Data Warehouses. In PVLDB, 4(12):1249--1259, 2011.
[21]
C. Garcia-Alvarado, V. Raghavan, S. Narayanan, and F. M. Waas. Automatic Data Placement in MPP Databases. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012.
[22]
O. Ivanov and S. Bartunov. Adaptive Query Optimization in PostgreSQL. In PGCon 2017 Conference, Ottawa, Canada, 2017.
[23]
M. Z. J. Sompolski and P. Boncz. Vectorization vs. compilation in query execution. In In Proc. of the 7th International Workshop on Data Management on New Hardware, 2011.
[24]
M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-milne, and M. Yoder. Impala: A modern, open-source sql engine for hadoop. In In Proc. CIDR15, 2015.
[25]
H. Liu, M. Xu, Z. Yu, V. Corvinell, and C. Zuzarte. Cardinality Estimation using Neural Networks. In Proceeds of the 25th Annual International Conference on Computer Science and Software Engineering (CASCON15), pages 53--59, 2015.
[26]
H. Lu and R. Setiono. Effective Query Size Estimation Using Neural Networks. In Applied Intelligenve, pages 173--183, May 2002.
[27]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. In PVLDB, 4(9):539--550, 2011.
[28]
V. Reddy Gankidi, N. Teletia, J. Patel, A. Halverson, and D. J. DeWitt. Indexing hdfs data in pdw: Splitting the data from the index. In PVLDB, 7(13):1520--1528, 2014.
[29]
S. Shankar, R. Nehme, J. Aguilar-Saborit, A. Chung, M. Elhemali, A. Halverson, E. Robinson, M. Subramanian, D. DeWitt, and C. Galindo-Legaria. Query Optimization in Microsoft SQL Server PDW. In Proceedings of SIGMOD 2012, Scottsdale, USA, pages 767--776, 2012.
[30]
M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO - DB2s LEarning Optimizer. In Proceedings of the 27th VLDB Conference, Roma, Italy, 2001.
[31]
F. Waas. Beyond Conventional Data Warehousing -Massively Parallel Data Processing with Greenplum Database. In In Proc. BIRTE, 2008.
[32]
W. Wang, M. Zhang, G. Chen, H. V. Jagadish, B. C. Ooi, and K.-L. Tan. Database Meets Deep Learning: Challenges and Opportunities. In SIGMOD Record, Vol. 45, No2, 2016.

Cited By

View all
  • (2025)Aion: Live Migration for In-Memory Databases with Zero Downtime and Reduced Redundant Data TransferData Science and Engineering10.1007/s41019-024-00276-5Online publication date: 15-Jan-2025
  • (2024)Development of an Intelligent Coal Production and Operation Platform Based on a Real-Time Data Warehouse and AI ModelEnergies10.3390/en1720520517:20(5205)Online publication date: 19-Oct-2024
  • (2024)Digital protection analysis of national traditional sports health cultural heritage based on big data in the era of data cloudMultimedia Tools and Applications10.1007/s11042-024-18371-083:27(69739-69758)Online publication date: 2-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 12
August 2018
426 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2018
Published in PVLDB Volume 11, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Aion: Live Migration for In-Memory Databases with Zero Downtime and Reduced Redundant Data TransferData Science and Engineering10.1007/s41019-024-00276-5Online publication date: 15-Jan-2025
  • (2024)Development of an Intelligent Coal Production and Operation Platform Based on a Real-Time Data Warehouse and AI ModelEnergies10.3390/en1720520517:20(5205)Online publication date: 19-Oct-2024
  • (2024)Digital protection analysis of national traditional sports health cultural heritage based on big data in the era of data cloudMultimedia Tools and Applications10.1007/s11042-024-18371-083:27(69739-69758)Online publication date: 2-Feb-2024
  • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
  • (2022)ByteHTAPProceedings of the VLDB Endowment10.14778/3554821.355483215:12(3411-3424)Online publication date: 1-Aug-2022
  • (2022)Application of Fusion of Fuzzy Mathematical Clustering Analysis in Enterprise Financial Management Cloud PlatformMathematical Problems in Engineering10.1155/2022/53545732022(1-9)Online publication date: 10-Sep-2022
  • (2022)Remus: Efficient Live Migration for Distributed Databases with Snapshot IsolationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526047(2232-2245)Online publication date: 10-Jun-2022
  • (2021)DBSpinner: Making a Case for Iterative Processing in Databases2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00273(2399-2410)Online publication date: Apr-2021
  • (2021)Intelligent Automated Operation and Operation Management of Large Enterprise Cloud Data Center Based on Artificial Intelligence2020 International Conference on Data Processing Techniques and Applications for Cyber-Physical Systems10.1007/978-981-16-1726-3_65(531-538)Online publication date: 2-Jun-2021
  • (2020)Industrial-strength OLTP using main memory and many coresProceedings of the VLDB Endowment10.14778/3415478.341553713:12(3099-3111)Online publication date: 14-Sep-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media