检索结果-内蒙古大学图书馆

34th IEEE international symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Boito, Francieli Gomes, Antonio Tadeu A. Peyrondet, Louis Teylo, Luan Univ Bordeaux LaBRI INRIA Bordeaux INPCNRSUMR 5800 F-33400 Talence France Lab Nacl Comp Cient LNCC Av Getulio Vargas 333 BR-25651075 Petropolis RJ Brazil

ISBN: (数字)9781665451574

ISBN: (纸本)9781665451574

In this paper, we present MSLIO, a code to mimic the I/O behavior of multiscale simulations. Such an I/O kernel is useful for HPC research, as it can be executed more easily and more efficiently than the full simulations when researchers are interested in the I/O load only. We validate MSLIO by comparing it to the I/O performance of an actual simulation, and we then use it to test some possible improvements to the output routine of the MHM (Multiscale Hybrid Mixed) library.

关键词： high-performance computing parallel I/O numeric library I/O kernel mini-app multiscale simulation

来源：评论

学校读者我要写书评

暂无评论

high-speed restoration of atomic force microscopy images using tikhonov regularization in GPGPU 26

High-speed restoration of atomic force microscopy images usi...

引用

26th IEEE international symposium on computer architecture and high performance computing workshops, SBAC-PADW 2014

作者： Quelhas, Klaus N. Cidade, Geraldo A. G. Farias, Ricardo COPPE/System Engineering Federal University of Rio de Janeiro Rio de Janeiro RJ Brazil Biophysics Institute Carlos Chagas Filho Federal University of Rio de Janeiro Rio de Janeiro RJ Brazil

ISBN: (纸本)9781479970148

The Atomic Force Microscopy (AFM) is a scanning probe technique widely used to produce nanometric scaled images of virtually any kind of non-conductive or biological surface. Depending on the scanning dimensions an expected AFM image structure is subjected to be modified by great amounts of external noise (low Signal/Noise ratios) - electrical or mechanical - and/or blurring due to the geometry of the measuring probe. In order to minimize such effects, image restoration techniques can be employed. The one based on the minimization of the Tikhonov's regularization functional is described, taking into account the characteristics of the measuring probe and the S/N ratio. This work proposes optimizations on both serial and parallel restoration algorithms, using CUDA library on a General Purpose Graphics Processing Unit, GPGPU, in terms of time performance and quality of restoration in regime of high speed imaging, one frame/sec or more. The results obtained are so far very promising, reaching speedups up to 43x over previous implementations. © 2014 IEEE.

关键词： Atomic force microscopy

来源：评论

学校读者我要写书评

暂无评论

Task-specific restricted delegation 07

Task-specific restricted delegation

引用

16th international symposium on high performance Distributed computing 2007, HPDC'07 and Co-Located workshops

作者： Alderman, Ian D. Livny, Miron Computer Sciences University of Wisconsin Madison

No abstract available

ISBN: (纸本)1595936734

No abstract available

关键词： PKI security distributed batch computing

来源：评论

学校读者我要写书评

暂无评论

Strategies to Improve the performance of a Geophysics Model for Different Manycore Systems 29

Strategies to Improve the Performance of a Geophysics Model ...

引用

8th international symposium on computer architecture and high performance computing (SBAC-PADW)

作者： Serpa, Matheus S. Cruz, Eduardo H. M. Diener, Matthias Krause, Arthur M. Navaux, Philippe O. A. Farres, Albert Rosas, Claudia Hanzich, Mauricio Panetta, Jairo Fed Univ Rio Grande do Sul UFRGS Informat Inst Porto Alegre RS Brazil BSC Barcelona Spain Aeronaut Inst Technol ITA Comp Sci Div Sao Jose Dos Campos SP Brazil

ISBN: (纸本)9781538648193

Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand the impact of each change applied to the software, in order to improve the performance as most as possible. In this paper, we propose several optimization strategies for a wave propagation model for five architectures: Intel Haswell, Intel Knights Corner, Intel Knights Landing, NVIDIA Kepler and NVIDIA Maxwell. We focus on improving the cache memory usage, vectorization, and locality in the memory hierarchy. We analyze the hardware impact of the optimizations, providing insights of how each strategy can improve the performance. The results show that NVIDIA Maxwell improves over Intel Haswell, Intel Knights Corner, Intel Knights Landing and NVIDIA Kepler performance by up to 17.9x.

关键词： Optimization computer architecture Program processors Mathematical model Acoustic waves Hardware Cache memory

来源：评论

学校读者我要写书评

暂无评论

Best of SBAC-PAD 2012

引用

PARALLEL computing 2014年第9期40卷 512-513页

作者： Schnorr, Lucas Mello Alexandre Navaux, Philippe Olivier Univ Fed Rio Grande do Sul Inst Informat BR-91501970 Porto Alegre RS Brazil

This special issue presents new trends in computer architecture and in parallel and distributed systems. It is based on the best papers of the 24th international symposium on computer architecture and high performance computing, which was held in New York, NY, USA on October 24-26, 2012 in the Columbia University. The authors were invited to provide extended versions of the papers presented in the conference, taking into account suggestions by the double-blinded peer review process and comments gathered during the conference.

关键词： computer architecture Parallel and distributed systems high performance computing

来源：评论

学校读者我要写书评

暂无评论

RACB: Resource aware cache bypass on GPUs 26

RACB: Resource aware cache bypass on GPUs

引用

26th IEEE international symposium on computer architecture and high performance computing workshops, SBAC-PADW 2014

作者： Dai, Hongwen Kartsaklis, Christos Li, Chao Janjusic, Tomislav Zhou, Huiyang Department of Electrical and Computer Engineering North Carolina State University Raleigh United States Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge United States

ISBN: (纸本)9781479970148

Caches are universally used in computing systems to hide long off-chip memory access latencies. Unlike CPUs, massive threads running simultaneously on GPUs bring a tremendous pressure on memory hierarchy. As a result, the limitation of cache resources becomes a bottleneck for a GPU to exploit thread-level parallelism (TLP) and memory-level parallelism (MLP) and achieve high performance. In this paper, we propose a mechanism to bypass L1D and L2 cache based on the availability of cache resources. Our proposed mechanism is based on the observation that a huge number of stalls coming from limited cache resources prohibit GPUs from providing a higher throughput. So we propose Resource Aware Cache Bypass (RACB) with minor hardware changes to eliminate such stalls to improve performance. We examine the effectiveness of this approach when applied to L1D and L2 cache separately as well as together. Evaluation results with NVIDIA computing SDK show that RACB generally improves performance the most when applied to both L1D and L2 cache, which is up to 88.05% and on an average of 16.73%;additionally, energy is saved up to 22.35% and on an average of 5.88% with minor hardware overheads. © 2014 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

A computational infrastructure for grid-based asynchronous parallel applications 07

A computational infrastructure for grid-based asynchronous p...

引用

16th international symposium on high performance Distributed computing 2007, HPDC'07 and Co-Located workshops

作者： Li, Zhen Parashar, Manish Electrical and Computer Engineering Department Rutgers University Piscataway NJ 08854

No abstract available

ISBN: (纸本)1595936734

No abstract available

关键词： asynchronous parallel shared-space

来源：评论

学校读者我要写书评

暂无评论

Assessing the performance of an architecture-aware optimization tool for neural networks 35

Assessing the performance of an architecture-aware optimizat...

引用

35th IEEE international symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Marichal, Raul Dufrechou, Ernesto Ezzatti, Pablo Fac Ingn Inst Comp Montevideo Uruguay

ISBN: (纸本)9798350381603

The important growth in the demand for Neural Network solutions has created an urgent need for efficient implementations across a wide array of environments and platforms. As industries increasingly rely on AI-driven technologies, optimizing the performance and effectiveness of these networks has become crucial. While numerous studies have achieved promising results in this field, the process of fine-tuning and identifying optimal architectures for specific problem domains remains a complex and resource-intensive task. As such, there is a pressing need to explore and evaluate techniques that can improve this optimization process, reducing costs and time-to-deployment while maximizing the overall performance of Neural Networks. This work focuses on evaluating the optimization process of NetAdpat for two neural networks on an Nvidia Jetson device. We observe a performance decay for the larger network when the algorithm tries to meet the latency constraint. Furthermore, we propose potential alternatives to optimize this tool. Particularly, we propose an alternative configuration search procedure that allows us to enhance the optimization process, achieving speedups of up to similar to 7x.

关键词： efficient computing neural network optimizations edge devices heterogeneous computing NetAdapt

来源：评论

学校读者我要写书评

暂无评论

high-level service connectors for component-based high performance computing

High-level service connectors for component-based high perfo...

引用

19th international symposium on computer architecture and high performance computing

作者： de Carvalho-Junior, Francisco Heron Correa, Ricardo Cordeiro Araujo, Gisele Azevedo Silva, Jefferson Carvalho Lins, Rafael Duelre Univ Fed Ceara Dept Comp Fortaleza Ceara Brazil Univ Fed Pernambuco Dept Elect Sistemas Recife PE Brazil

ISBN: (纸本)9780769530147

Component-based programming has been applied to address the requirements of applications in high performance computing (HPC). The usual service connectors of commercial component models do not fit some requirements of HPC, mainly regarding the support of parallelism, however This paper looks at extensions to the usual notion of service connector to meet such requirements, using the # component model as a substratum, evidencing its expressiveness.

关键词： computer programming

来源：评论

学校读者我要写书评

暂无评论

A performance Comparison of HPC Workloads on Traditional and Cloud-based HPC Clusters 35

A Performance Comparison of HPC Workloads on Traditional and...

引用

35th IEEE international symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Munhoz, Vanderlei Bonfils, Antoine Castro, Marcio Mendizabal, Odorico Univ Fed Santa Catarina Florianopolis SC Brazil Polytech Grenoble Grenoble France

ISBN: (纸本)9798350381603

Cloud computing allows users to access large computing infrastructures quickly. In the high performance computing (HPC) context, public cloud resources emerge as an economical alternative, allowing institutions and research groups to use highly parallel infrastructures in the cloud. However, parallel runtime systems and software optimizations proposed over the years to improve the performance and scalability of HPC applications targeted traditional on-premise HPC clusters, where developers have direct access to the underlying hardware without any kind of virtualization. In this paper, we analyze the performance and scalability of HPC applications from the NAS Parallel Benchmarks suite when running on a virtualized HPC cluster built on top of Amazon Web Services (AWS), contrasting them with the results obtained with the same applications running on a traditional on-premise HPC cluster from Grid'5000. Our results show that CPU-bound applications achieve similar results in both platforms, whereas communication-bound applications may be impacted by the limited network bandwidth in the cloud. Cloud infrastructure demonstrated better performance under workloads with moderate communication and mediumsized messages.

关键词： high performance computing Cloud computing NAS Parallel Benchmarks performance Evaluation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：