Materials science is undergoing profound changes due to advances in characterization instrumentation that have resulted in an explosion of data in terms of volume, velocity, variety and complexity. Harnessing these da...
详细信息
data driven science, accompanied by the explosion of petabytes of data, has called into need dedicated analytics computing resources. Dedicated analytics clusters require large capital outlays due to their expensive h...
详细信息
ISBN:
(纸本)9781467390064
data driven science, accompanied by the explosion of petabytes of data, has called into need dedicated analytics computing resources. Dedicated analytics clusters require large capital outlays due to their expensive hardware requirements. Additionally, if such resources are located far from the data they analyze, they also incur substantial data transfer, which has both cost and latency implications. In this paper, we benchmark a variety of high-performance computing (HPC) architectures for classic data science algorithms, as well as conduct a cost analysis of these architectures. Additionally, we compare algorithms across analytic frameworks, as well as explore hidden costs in the form of queuing mechanisms. We observe that node architectures with large memory and high memory bandwidth are better suited for big data analytics on HPC hardware. We also conclude that cloud computing is more cost effective for small or experimental data workloads, but HPC is more cost effective at scale. Additionally, we quantify the hidden costs of queuing and how it relates to data science workloads. Finally, we observe that software developed for the cloud, such as Spark, performs significantly worse than pbdR when run in HPC environments.
The pursuit of more advanced electronics, and finding solutions to energy needs often hinges upon the discovery and optimization of new functional materials. However, the discovery rate of these materials is alarmingl...
详细信息
暂无评论