版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:College of Information Science and Technology Bohai University Jinzhou Liaoning China College of Economy Bohai University Jinzhou Liaoning China
出 版 物:《Procedia Computer Science》
年 卷 期:2024年第243卷
页 面:67-75页
主 题:Big Data Environment Hive technology Data Warehouse ETL System Function
摘 要:Hive can store massive data through extended clusters, far exceeding the expansion and storage capabilities of traditional databases, and has become a major tool for building data warehouses in the era of big data. Based on Hive technology, this paper studies the enterprise-level data warehouse architecture composed of data storage layer, Hive data warehouse layer and application layer. Then, the paper studies data warehouse tools, uses HDFS for underlying storage and MapReduce for computing engine, and compares Hive data warehouse with relational database. Then, the ETL process is studied. The data goes through three stages: extraction, conversion and loading, and finally the data flows from the source to the target end. Finally, the system function and test are studied, the subsystems including data processing, data management, data migration and data analysis are constructed, and the core functions are tested. The test results were consistent with the expected results and met the delivery standards.