检索结果-内蒙古大学图书馆

Workshop on High-Performance Computing for Astronomy, Astro-HPC '12

作者： Andrijauskas, Fabio Gradvohl, André Leon Sampaio School of Technology University of Campinas Rua Paschoal Marmo 1888 Limeira - 13484-332 32038 São Paulo Brazil School of Technology University of Campinas Rua Paschoal Marmo 1888 Limeira - 13484-332 São Paulo Brazil

ISBN: (纸本)9781450313384

There are several projects and missions designed to strictly observe the Sun. These projects usually produce a large amount of information embedded in images. The analysis of such information is valuable for the study and monitoring of solar storms that can affect telecommunications, for instance. The databases sizes with sun image are huge. Several projects are producing images of the Sun and exists a considerable amount of stored images. Combining image processing algorithms with parallel programming techniques we can compute such information faster and a major volume. This paper describes our parallel OpenMP-MPI hybrid solutions for processing Sun images, and our results obtained in a hybrid system, i.e. a cluster with several multi-core nodes. Specifically, we present two methods to detect and categorize solar filaments in hybrid systems: Filament Diffusion-Detection based on graphs and Morph Detection, based on morphological operators. The results show that the Filament Diffusion-Detection based on graphs detects approximately 80% of the filaments, with a 326-fold speed-up over. In turn, Morph Detection detects 58% of the objects with a 54-fold increase in speed. Overall, these results show that our OpenMP-MPI combination works well for hybrid architectures, but more optimizations are needed to improve accuracy. Copyright 2012 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A parallel solution to finding nodal neighbors in generic meshes

引用

METHODSX 2020年 7卷 100954页

作者： Qi, Pian Mei, Gang Xu, Nengxiong Tian, Hong China Univ Geosci Beijing Sch Engn & Technol Beijing 100083 Peoples R China China Univ Geosci Wuhan Fac Engn Wuhan 430074 Peoples R China

In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but expensive for large meshes. To improve the efficiency, the parallelism is adopted by utilizing the modern Graphics Processing Unit (GPU). The presented parallel solution is heavily dependent on the parallel sorting, scan, and reduction. Our parallel solution is efficient and easy to implement, but requires the allocation of large device memory. Our parallel solution can generate the speedups of approximately 55 and 90 over the serial solution when finding the neighboring nodes and elements, respectively. It is easy to implement due to the reason it does not need to perform the mesh-coloring before finding neighbors There are no complex data structures, only integer arrays are needed, which makes our parallel solution very effective. (C) 2020 The Author(s). Published by Elsevier B.V.

关键词： Computational geometry Mesh topology Neighbors finding parallel programming GPU

来源：评论

学校读者我要写书评

暂无评论

Beyond expert-only parallel programming?

Beyond expert-only parallel programming?

引用

2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability, RACES 2012

作者： McKenney, Paul E. IBM Linux Technology Center United States

ISBN: (纸本)9781450316323

My parallel-programming education began in earnest when I joined Sequent Computer Systems in late 1990. This education was both brief and effective: within a few short years, my co-workers and I were breaking new ground [MG92, MS93, MS98].1 Nor was I alone: Sequent habitually hired new-to-parallelism engineers and had them producing competent parallel code within a few months. Nevertheless, more than two decades later, parallel programming is perceived to be difficult to teach and learn. Is parallel programming an exception to the typical transitioning of technnology from impossible to expert-only to routine to unworthy of conscious thought? Copyright © 2012 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Task-parallel programming on NUMA architectures

引用

18th International Conference on parallel Processing, Euro-Par 2012

作者： Terboven, Christian Schmidl, Dirk Cramer, Tim An Mey, Dieter JARA RWTH Aachen University Center for Computing and Communication Germany

ISBN: (纸本)9783642328190

The multicore era has led to a renaissance of shared memory parallel programming models. Moreover, the introduction of task-level parallelization raises the level of abstraction compared to thread-centric expression of parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be controlled and non-local data is accessed. This work investigates various approaches to express task-parallelism using the OpenMP tasking model, from a programmer's point of view. We describe and compare task creation strategies and devise methods to preserve locality on NUMA architectures while optimizing the degree of parallelism. Our proposals are evaluated on reasonably large NUMA systems with both important application kernels as well as real-world simulation codes. © 2012 Springer-Verlag.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A comparison of five parallel programming models for C++

A comparison of five parallel programming models for C++

引用

35th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2012

作者： Ajkunic, Ensar Fatkic, Hana Omerovic, Emina Talic, Kristina Nosovic, Novica Faculty of Electrical Engineering University of Sarajevo Sarajevo Bosnia and Herzegovina

ISBN: (纸本)9789532330724

Multi-core processors offer a growing potential of parallelism but pose a challenge of program development for achieving high performance in applications. This pape r presents a comparison of the five parallel programming models for implementing parallel programs in C++ on multi -core computer systems. The models under consideration are Intel®'s Thread Building Blocks (TBB), OpenMPI, Intel®'s Cilk™ Plus, OpenMP and Pthreads. For demonstration purposes multiple parallel implementations of an algorithm for matrix multiplication suitable for parallelization were created. The main goal of this paper is a comprehensive comparison of chosen models with respect to the following criteria: performance and coding effort required. © 2012 MIPRO.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

High level languages for efficient parallel programming

High level languages for efficient parallel programming

引用

2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012

作者： Limet, Sébastien LIFO Université d'Orléans Orléans France

ISBN: (纸本)9781467323598

Efficient parallel programming has always been very tricky and only expert programmers are able to take the most of the computing power of modern computers. Such a situation is an obstacle to the development of the high performance computing in other sciences as well as in the industry. The fast changes in the computer architecture (multicores, manycores, GPU, clusters,...) make even more difficult, even for an experienced programmer, to remain at the forefront of these evolutions. On the other hand, a huge amount of work has been done to develop programming languages or libraries that tend to help the programmers to write parallel programs which are more or less efficient. The key point in this kind of research is to find a good balance between the simplicity of the programming and the efficiency of the resulting programs. Many approaches have been proposed but none really prevail over the others. This paper is a small overview of some directions that seem promising to both simplify parallel programming and produce very efficient programs. © 2012 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

An experience on the organization of the first Spanish parallel programming contest

An experience on the organization of the first Spanish paral...

引用

24th International Olympiad in Informatics, IOI 2012

作者： Almeida, Francisco Blanco Pérez, Vicente Cuenca, Javier Fernández-Pascual, Ricardo García-Mateos, Ginés Giménez, Domingo Guillén, José Palomino Benito, Juan Alejandro Requena, María-Eugenia Ranilla, José Departamento de Estadística I.O y Computación Universidad de la Laguna 38201 Tenerife Spain Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia 30071 Murcia Spain Departamento de Informática y Sistemas Universidad de Murcia Campus de Espinardo 30071 Murcia Spain Centro de Supercomputación Fundación Parque Científico Ctra. Madrid km. 388 Complejo Espinardo 30100 Murcia Spain Departamento de Informática Universidad de Oviedo Campus de Viesques 33204 Gijón Spain

ISBN: (纸本)9771822773007

The first Spanish parallel programming Contest was organized in September 2011 within the Jornadas de Paralelismo, in La Laguna, Spain. The aim of the contest is to disseminate parallelism among the participants and Computer Science students who can use the material generated in the contest for educational purposes. The contest is similar to other sequential and parallel programming contests in which teams participate by solving a set of problems in a given time. But the Spanish contest has characteristics which distinguish it from other contests: an automatic tool (Mooshak) is used to validate the solutions and the tool has been modified to send the solutions to a cluster of nodes and to obtain the classification based on the speed-ups achieved;candidates can participate both in situ and online;a classification with the records for each problem is maintained on the web page of the contest, with explanations and codes of the record solutions so that the page can be used for educational purposes. This paper summarizes the experience and perspectives of the contest. © 2012 Vilnius University.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Data access skipping for recursive partitioning methods

引用

COMPUTER LANGUAGES SYSTEMS & STRUCTURES 2018年 53卷 143-162页

作者： Kislal, Orhan Kandemir, Mahmut T. Penn State Univ State Coll PA 16802 USA

The memory performance of data mining applications became crucial due to increasing dataset sizes and multi-level cache hierarchies. Recursive partitioning methods such as decision tree and random forest learning are some of the most important algorithms in this field, and numerous researchers worked on improving the accuracy of model trees as well as enhancing the overall performance of the learning process. Most modern applications that employ decision tree learning favor creating multiple models for higher accuracy by sacrificing performance. In this work, we exploit the flexibility inherent in recursive partitioning based applications regarding performance and accuracy tradeoffs, and propose a framework to improve performance with negligible accuracy losses. This framework employs a data access skipping module (DASM) using which costly cache accesses are skipped according to the aggressiveness of the strategy specified by the user and a heuristic to predict skipped data accesses to keep accuracy losses at minimum. Our experimental evaluation shows that the proposed framework offers significant performance improvements (up to 25%) with relatively much smaller losses in accuracy (up to 8%) over the original case. We demonstrate that our framework is scalable under various accuracy requirements via exploring accuracy changes over time and replacement policies. In addition, we explore NoC/SNUCA systems for similar opportunities of memory performance improvement. (C) 2018 Elsevier Ltd. All rights reserved.

关键词： Memory Machine learning Compiler optimization parallel programming

来源：评论

学校读者我要写书评

暂无评论

Automating the Development of High-Performance Multigrid Solvers

引用

PROCEEDINGS OF THE IEEE 2018年第11期106卷 1969-1984页

作者： Schmitt, Christian Kronawitter, Stefan Hannig, Frank Teich, Juergen Lengauer, Christian Friedrich Alexander Univ Erlangen Nurnberg Dept Comp Sci D-91058 Erlangen Germany Univ Passau Fac Comp Sci & Math D-94032 Passau Germany

The purpose of a domain-specific language (DSL) is to enable the application programmer to specify a problem, or an abstract algorithm description, in his/her domain of expertise without being burdened by implementation details. The ideal scenario is that the implementation detail is added in an automatic process of program translation and code generation. The approach of domain-specific program generation has lately received increasing attention in the area of computational science and engineering. In this paper, we introduce the new code generation framework Athariac. Its goal is to support the quick implementation of a language processing and program optimization platform for a given DSL based on stepwise term rewriting. We demonstrate the framework's use on our DSL ExaSlang for the specification and optimization of multigrid solvers. On this example, we provide evidence of Athariac's potential for making domain-specific software engineering more productive.

关键词： Application programming interfaces multigrid methods numerical simulation parallel programming scientific computing software engineering supercomputers

来源：评论

学校读者我要写书评

暂无评论

A comparative evaluation of parallel programming models for shared-memory architectures

A comparative evaluation of parallel programming models for ...

引用

2012 10th IEEE International Symposium on parallel and Distributed Processing with Applications, ISPA 2012

作者： Sanchez, Luis Miguel Fernandez, Javier Sotomayor, Rafael Garcia, J. Daniel Computer Architecture and Technology Area Computer Science Department Universidad Carlos III De Madrid Madrid Colmenarejo 28270 Spain

ISBN: (纸本)9780769547015

Nowadays, most computers that are commercially available off-the-shelf (COTS) include hardware features that increase the performance of parallel general-purpose threads (hyper threading, multicore, ccNUMA architectures) or SIMD kernels (CPU vector instructions, GPUs). The purpose of this paper is to perform a compared evaluation of several parallel programming models where each one is fitted to exploit some of these features but also each one requires a different level of programming skills. Four parallel programming models (OpenMP, Intel TBB, Intel ArBB, and CUDA) have been selected. The idea is to cover a wide spectrum of programming models and most of the parallel hardware features included in modern computers. On one hand, OpenMP and TBB platforms, that exploit parallel threads running on multicore systems. On the other hand, ArBB, that combines muticore parallel threads and multicore SIMD features with a simpler programming model, and CUDA that exploits SIMD features of the GPU hardware. Our results obtained with the benchmarks used on this paper suggest that OpenMP and TBB have a lower performance compared to ArBB and CUDA. But also that ArBB performance tends to be comparable with CUDA performance in most cases (although it is normally lower). Thus, there are evidences that a careful designed top range multicore and multisocket architecture, can be comparable in terms of performance with top range GPU cards for many applications, with the advantage of a simpler programming model. © 2012 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：