In this paper, we present MSLIO, a code to mimic the I/O behavior of multiscale simulations. Such an I/O kernel is useful for HPC research, as it can be executed more easily and more efficiently than the full simulati...
详细信息
ISBN:
(数字)9781665451574
ISBN:
(纸本)9781665451574
In this paper, we present MSLIO, a code to mimic the I/O behavior of multiscale simulations. Such an I/O kernel is useful for HPC research, as it can be executed more easily and more efficiently than the full simulations when researchers are interested in the I/O load only. We validate MSLIO by comparing it to the I/O performance of an actual simulation, and we then use it to test some possible improvements to the output routine of the MHM (Multiscale Hybrid Mixed) library.
The Atomic Force Microscopy (AFM) is a scanning probe technique widely used to produce nanometric scaled images of virtually any kind of non-conductive or biological surface. Depending on the scanning dimensions an ex...
详细信息
Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results fa...
详细信息
ISBN:
(纸本)9781538648193
Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand the impact of each change applied to the software, in order to improve the performance as most as possible. In this paper, we propose several optimization strategies for a wave propagation model for five architectures: Intel Haswell, Intel Knights Corner, Intel Knights Landing, NVIDIA Kepler and NVIDIA Maxwell. We focus on improving the cache memory usage, vectorization, and locality in the memory hierarchy. We analyze the hardware impact of the optimizations, providing insights of how each strategy can improve the performance. The results show that NVIDIA Maxwell improves over Intel Haswell, Intel Knights Corner, Intel Knights Landing and NVIDIA Kepler performance by up to 17.9x.
This special issue presents new trends in computerarchitecture and in parallel and distributed systems. It is based on the best papers of the 24th internationalsymposium on computerarchitecture and highperformance...
详细信息
This special issue presents new trends in computerarchitecture and in parallel and distributed systems. It is based on the best papers of the 24th internationalsymposium on computerarchitecture and highperformancecomputing, which was held in New York, NY, USA on October 24-26, 2012 in the Columbia University. The authors were invited to provide extended versions of the papers presented in the conference, taking into account suggestions by the double-blinded peer review process and comments gathered during the conference.
Caches are universally used in computing systems to hide long off-chip memory access latencies. Unlike CPUs, massive threads running simultaneously on GPUs bring a tremendous pressure on memory hierarchy. As a result,...
详细信息
The important growth in the demand for Neural Network solutions has created an urgent need for efficient implementations across a wide array of environments and platforms. As industries increasingly rely on AI-driven ...
详细信息
ISBN:
(纸本)9798350381603
The important growth in the demand for Neural Network solutions has created an urgent need for efficient implementations across a wide array of environments and platforms. As industries increasingly rely on AI-driven technologies, optimizing the performance and effectiveness of these networks has become crucial. While numerous studies have achieved promising results in this field, the process of fine-tuning and identifying optimal architectures for specific problem domains remains a complex and resource-intensive task. As such, there is a pressing need to explore and evaluate techniques that can improve this optimization process, reducing costs and time-to-deployment while maximizing the overall performance of Neural Networks. This work focuses on evaluating the optimization process of NetAdpat for two neural networks on an Nvidia Jetson device. We observe a performance decay for the larger network when the algorithm tries to meet the latency constraint. Furthermore, we propose potential alternatives to optimize this tool. Particularly, we propose an alternative configuration search procedure that allows us to enhance the optimization process, achieving speedups of up to similar to 7x.
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). The usual service connectors of commercial component models do not fit some requirements of...
详细信息
ISBN:
(纸本)9780769530147
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). The usual service connectors of commercial component models do not fit some requirements of HPC, mainly regarding the support of parallelism, however This paper looks at extensions to the usual notion of service connector to meet such requirements, using the # component model as a substratum, evidencing its expressiveness.
Cloud computing allows users to access large computing infrastructures quickly. In the highperformancecomputing (HPC) context, public cloud resources emerge as an economical alternative, allowing institutions and re...
详细信息
ISBN:
(纸本)9798350381603
Cloud computing allows users to access large computing infrastructures quickly. In the highperformancecomputing (HPC) context, public cloud resources emerge as an economical alternative, allowing institutions and research groups to use highly parallel infrastructures in the cloud. However, parallel runtime systems and software optimizations proposed over the years to improve the performance and scalability of HPC applications targeted traditional on-premise HPC clusters, where developers have direct access to the underlying hardware without any kind of virtualization. In this paper, we analyze the performance and scalability of HPC applications from the NAS Parallel Benchmarks suite when running on a virtualized HPC cluster built on top of Amazon Web Services (AWS), contrasting them with the results obtained with the same applications running on a traditional on-premise HPC cluster from Grid'5000. Our results show that CPU-bound applications achieve similar results in both platforms, whereas communication-bound applications may be impacted by the limited network bandwidth in the cloud. Cloud infrastructure demonstrated better performance under workloads with moderate communication and mediumsized messages.
暂无评论