research-article

Enterprise Level Data Warehouse System Based on Hive in Big Data Environment

Authors:

Xiaoyun Fan,

Jianfeng LuAuthors Info & Claims

Volume 243, Issue C

Pages 67 - 75

https://doi.org/10.1016/j.procs.2024.09.010

Published: 01 January 2024 Publication History

Abstract

Hive can store massive data through extended clusters, far exceeding the expansion and storage capabilities of traditional databases, and has become a major tool for building data warehouses in the era of big data. Based on Hive technology, this paper studies the enterprise-level data warehouse architecture composed of data storage layer, Hive data warehouse layer and application layer. Then, the paper studies data warehouse tools, uses HDFS for underlying storage and MapReduce for computing engine, and compares Hive data warehouse with relational database. Then, the ETL process is studied. The data goes through three stages: extraction, conversion and loading, and finally the data flows from the source to the target end. Finally, the system function and test are studied, the subsystems including data processing, data management, data migration and data analysis are constructed, and the core functions are tested. The test results were consistent with the expected results and met the delivery standards.

References

[1]

E A Soukaina, H Hicham, K E A Kenza, et al., Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse[J], Journal of Parallel and Distributed Computing 176 (1) (2023) 70–79.

Google Scholar

[2]

A. Rudniy, Data Warehouse Design for Big Data in Academia[J], Computers, Materials Continua 71 (1) (2022) 979–992.

Google Scholar

[3]

N H Benkhaled, D Berrabah, F. Boufares, Data Warehouses and Big Data: How to Cope With Data Quality[J], International Journal of Organizational and Collective Intelligence (IJOCI) 10 (3) (2020) 1–13.

Google Scholar

[4]

R C Valncio, M M L Silva, W Tenório, et al., Data Warehouse Design to Support Social Media Analysis in a Big Data Environment[J], Journal of Computer Science 16 (2) (2020) 126–136.

Google Scholar

[5]

A Benítez Hidalgo, I Navas Delgado, M.D.M. Roldán García, NORA: Scalable OWL reasoner based on NoSQL databases and Apache Spark[J], Software: Practice and Experience 53 (12) (2023) 2377–2392.

Google Scholar

[6]

K Nitin, Big Data Using Hadoop and Hive [M], De Gruyter (2023) -07-19.

Google Scholar

[7]

E Costa, C Costa, Y M Santos, Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems[J], Journal of Big Data 6 (1) (2019) 1–38.

Google Scholar

[8]

A.E.M. Banane, A new system for massive RDF data management using Big Data query languages Pig, Hive, and Spark[J], International Journal of Computing and Digital Systems 9 (2) (2020) 259–270.

Google Scholar

[9]

L T Gabriela, V Andreea, A B Ramona, et al., Big Data ETL Process and Its Impact on Text Mining Analysis for Employees' Reviews[J], Applied Sciences 12 (15) (2022) 7509. 7509.

Google Scholar

[10]

P Kumar, V. Gaded, Value Proposition and ETL Process in Big Data Environment[J], International Journal of Distributed and Cloud Computing 7 (1) (2019) 1–4.

Google Scholar

Index Terms

Enterprise Level Data Warehouse System Based on Hive in Big Data Environment
1. Information systems
  1. Data management systems
  2. Information systems applications
    1. Decision support systems
      1. Data warehouses

Index terms have been assigned to the content through auto-classification.

Recommendations

Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution

The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective ...
Hengam a MapReduce-Based Distributed Data Warehouse for Big Data: A MapReduce-Based Distributed Data Warehouse for Big Data

When working with a high volume of information that follows an exponential pattern, the authors confront big data. This huge amount of information makes big data retrieval and analytics important issues. There have been many attempts to solve data ...
Data Warehouse with Big Data Technology for Higher Education
Abstract
Nowadays, data warehouse tools and technologies cannot handle the load and analytic process of data into meaningful information for top management. Big data technology should be implemented to extend the existing data warehouse solutions. ...

Comments

Information & Contributors

Information

Published In

Procedia Computer Science Volume 243, Issue C

2024

1296 pages

ISSN:1877-0509

EISSN:1877-0509

Issue’s Table of Contents

The Author(s).

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2024

Author Tags

Qualifiers

Research-article

Index Terms

Recommendations

Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution

Hengam a MapReduce-Based Distributed Data Warehouse for Big Data: A MapReduce-Based Distributed Data Warehouse for Big Data

Data Warehouse with Big Data Technology for Higher Education

Comments

Published In

Publisher

Publication History

Author Tags

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Abstract

References

Index Terms

Recommendations

Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution

Hengam a MapReduce-Based Distributed Data Warehouse for Big Data: A MapReduce-Based Distributed Data Warehouse for Big Data

Data Warehouse with Big Data Technology for Higher Education

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations