As an important resource for machine translation and cross-language information retrieval, collecting large-scale parallel corpus has been paid wide attention. With the development of the Internet, researchers begin t...
详细信息
To generate large number of reports in a limited time window, four techniques were proposed, including ROLAP&SQL, Shared Scanning, Hadoop based Solution, and MOLAP&Cube Sharding, an algorithm that performs in ...
详细信息
To generate large number of reports in a limited time window, four techniques were proposed, including ROLAP&SQL, Shared Scanning, Hadoop based Solution, and MOLAP&Cube Sharding, an algorithm that performs in memory aggregation was designed for the second solution. The experiment results show that all techniques except ROLAP&SQL can meet the time window constraint, the Hadoop based solution is a promising technique owe to its highly scalability. Considering maturity of the techniques and their performance, we put MOLAP&Cube Sharding into practice while keeping an eye on Hadoop for future adoption.
暂无评论