Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enterprise Level Data Warehouse System Based on Hive in Big Data Environment

Published: 01 January 2024 Publication History

Abstract

Hive can store massive data through extended clusters, far exceeding the expansion and storage capabilities of traditional databases, and has become a major tool for building data warehouses in the era of big data. Based on Hive technology, this paper studies the enterprise-level data warehouse architecture composed of data storage layer, Hive data warehouse layer and application layer. Then, the paper studies data warehouse tools, uses HDFS for underlying storage and MapReduce for computing engine, and compares Hive data warehouse with relational database. Then, the ETL process is studied. The data goes through three stages: extraction, conversion and loading, and finally the data flows from the source to the target end. Finally, the system function and test are studied, the subsystems including data processing, data management, data migration and data analysis are constructed, and the core functions are tested. The test results were consistent with the expected results and met the delivery standards.

References

[1]
E A Soukaina, H Hicham, K E A Kenza, et al., Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse[J], Journal of Parallel and Distributed Computing 176 (1) (2023) 70–79.
[2]
A. Rudniy, Data Warehouse Design for Big Data in Academia[J], Computers, Materials Continua 71 (1) (2022) 979–992.
[3]
N H Benkhaled, D Berrabah, F. Boufares, Data Warehouses and Big Data: How to Cope With Data Quality[J], International Journal of Organizational and Collective Intelligence (IJOCI) 10 (3) (2020) 1–13.
[4]
R C Valncio, M M L Silva, W Tenório, et al., Data Warehouse Design to Support Social Media Analysis in a Big Data Environment[J], Journal of Computer Science 16 (2) (2020) 126–136.
[5]
A Benítez Hidalgo, I Navas Delgado, M.D.M. Roldán García, NORA: Scalable OWL reasoner based on NoSQL databases and Apache Spark[J], Software: Practice and Experience 53 (12) (2023) 2377–2392.
[6]
K Nitin, Big Data Using Hadoop and Hive [M], De Gruyter (2023) -07-19.
[7]
E Costa, C Costa, Y M Santos, Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems[J], Journal of Big Data 6 (1) (2019) 1–38.
[8]
A.E.M. Banane, A new system for massive RDF data management using Big Data query languages Pig, Hive, and Spark[J], International Journal of Computing and Digital Systems 9 (2) (2020) 259–270.
[9]
L T Gabriela, V Andreea, A B Ramona, et al., Big Data ETL Process and Its Impact on Text Mining Analysis for Employees' Reviews[J], Applied Sciences 12 (15) (2022) 7509. 7509.
[10]
P Kumar, V. Gaded, Value Proposition and ETL Process in Big Data Environment[J], International Journal of Distributed and Cloud Computing 7 (1) (2019) 1–4.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Procedia Computer Science
Procedia Computer Science  Volume 243, Issue C
2024
1296 pages
ISSN:1877-0509
EISSN:1877-0509
Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2024

Author Tags

  1. Big Data Environment
  2. Hive technology
  3. Data Warehouse
  4. ETL
  5. System Function

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media