Extract, transform, and load (ETL) is a very common and important technology for building data warehouse includes business intelligence. When people issue a very complex SQL query to acquit data from a transaction sys...
详细信息
ISBN:
(纸本)9780769551180
Extract, transform, and load (ETL) is a very common and important technology for building data warehouse includes business intelligence. When people issue a very complex SQL query to acquit data from a transaction system into a data warehouse, it involves many procedures including table-joining, sort, and aggregation. Such procedures require significant retrieving step and huge data transferring from tables. The intensive querying very often causes performance issues to be concerned. Moreover, it commonly generates negative impacts on data instance resources. How to improve the performance for ETL becomes critical and challenging. This paper presents a parallel processing solution that splitting big and complex SQL query into small pieces in distributed computing manor. The proposed method aims at reducing cost of computation, while ensuring data integrity among joined tables. The innovative idea can be verified through selected test-beds of performance tuning.
暂无评论