检索结果-内蒙古大学图书馆

Using BSP and python to simplify parallel programming

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING THEORY METHODS AND APPLICATIONS 2006年第1-2期22卷 123-157页

作者： Hinsen, K Langtangen, HP Skavhaug, O Odegård, A CNRS UPR 4301 Ctr Biophys Mol F-45071 Orleans 2 France SRL N-1325 Lysaker Norway

Scientific computing is usually associated with compiled languages for maximum efficiency. However, in a typical application program, only a small part of the code is time-critical and requires the efficiency of a compiled language. It is often advantageous to use interpreted high-level languages for the remaining tasks, adopting a mixed-language approach. This will be demonstrated for Python, an interpreted object-oriented high-level language that is well suited for scientific computing. Particular attention is paid to high-level parallel programming using Python and the BSP model. We explain the basics of BSP and how it differs from other parallel programming tools like MPI. Thereafter we present an application of Python and BSP for solving a partial differential equation from computational science, utilizing high-level design of libraries and mixed-language (Python-C or Python-Fortran) programming. (c) 2004 Published by Elsevier B.V.

关键词： BSP python parallel programming

来源：评论

学校读者我要写书评

暂无评论

An approach of performance comparisons with OpenMP and CUDA parallel programming on multicore systems

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2016年第16期28卷 4230-4245页

作者： Chang, Chih-Hung Lu, Chih-Wei Yang, Chao-Tung Chang, Tzu-Chieh Hsiuping Univ Sci & Technol Dept Informat Management Taichung Taiwan Tunghai Univ Dept Comp Sci Taichung Taiwan

In the past, the tenacious semiconductor problems of operating temperature and power consumption limited the performance growth for single-core microprocessors. Microprocessor vendors hence adopt the multicore chip organizations with parallel processing because the new technology promises faster and lower power needed. In a short time, this trend floods first the development of CPU, then also the other peripherals like GPU. Modern GPUs are very efficient in manipulating computer graphics, and their highly parallel structure makes them even more effective than general-purpose CPUs for a range of graphical complex algorithms. However, technology of multicore processor brought revolution and unavoidable collision to the programming personnel. Multicore processor has high performance;however, parallel processing brings not only the opportunity but also a challenge. The issue of efficiency and the way how programmer or compiler parallelizes the software explicitly are the keys that enhance the performance on multicore chip. In this paper, we propose a parallel programming approach using hybrid CUDA, OpenMP, and MPI programming. There would be two verificational experiments presented in the paper. In the first, we would verify the availability and correctness of the auto-parallel tools, and discuss the performance issues on CPU, GPU, and embedded system. In the second, we would verify how the hybrid programming could surely improve performance. Copyright (C) 2016 John Wiley & Sons, Ltd.

关键词： auto-parallel parallel programming multicore OpenMP CUDA

来源：评论

学校读者我要写书评

暂无评论

Horde: a parallel programming Framework for Clusters

Horde: a Parallel Programming Framework for Clusters

引用

1st IEEE Symposium on Web Society (SWS 2009)

作者： Xu, Wenhao Wu, Yongwei Xue, Wei Zhang, Wusheng Yuan, Ye Zhang, Kai Tsinghua Univ Dept Comp Sci & Technol Beijing 100084 Peoples R China

ISBN: (纸本)9781424441563

Horde is a general programming framework for writing parallel applications in clusters. A computing task is modeled as a graph in Horde. Each sub-task maps to one vertex and data channels map to edges in the graph. programming with Horde is very simple by writing sequential code for vertexes and adding edges to link vertexes. Horde can tolerant transient fault and provide support to write code for toleranting permanent faults. Horde is portable and support various cluster job managers. We evaluate Horde's efficiency in communication through micro benchmarks and prove the easy-of-use of Horde by implementing a MapReuce engine. The test in a small scale cluster show that our implementation outperforms Hadoop.

关键词： Concurrency parallel programming Cluster Computing Task Graph

来源：评论

学校读者我要写书评

暂无评论

Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

引用

THEORY OF COMPUTING SYSTEMS 2003年第5期36卷 521-552页

作者： Naishlos, D Nuzman, J Tseng, CW Vishkin, U Univ Maryland Dept Comp Sci College Pk MD 20742 USA Univ Maryland Dept Elect & Comp Engn College Pk MD 20742 USA Univ Maryland Inst Adv Comp Studies College Pk MD 20742 USA Technion Israel Inst Technol Dept Comp Sci IL-32000 Haifa Israel

Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style algorithms;(2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access patterns, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.

关键词： Load balancing parallel programming Computer hardware TRANSMITTING

来源：评论

学校读者我要写书评

暂无评论

parallel programming ON CENJU - A MULTIPROCESSOR SYSTEM FOR MODULAR CIRCUIT SIMULATION

引用

NEC RESEARCH & DEVELOPMENT 1991年第3期32卷 421-429页

作者： NAKATA, T MATSUSHITA, S TANABE, N KAJIHARA, N ONOZUKA, H ASANO, Y KOIKE, N NEC C&C Systems Research Lab

Cenju is an experimental multiprocessor system with a distributed shared memory scheme developed mainly for circuit simulation. The system is composed of 64 PEs (Processor Elements) which are divided into eight clusters. In each cluster, eight PEs are connected by a cluster bus. The cluster buses are in turn connected by a multistage network to form the whole system. Each PE consists of 32-bit microprocessor MC68020 (20 MHz), 4/8 MB of RAM and a floating-point processor WTL1167 (20 MHz). The system supports parallel programming using C and FORTRAN, in which parallel primitives are provided as subroutines to be embedded by the programmer. In this system, programmers must adhere to a Producer-Consumer model in which the producer of the data always writes the data to the consumer's memory. The simulation algorithm used in circuit simulation is hierarchical modular simulation in which the circuit to be simulated is divided subcircuits connected by an interconnection network. For the 64 multiprocessor system, a speedup of 15.8 compared to the one processor case was attained for a DRAM circuit. Furthermore, by parallelizing the serial bottleneck, a speedup of 25.8 could be realized. In this article, authors briefly describe the simulation algorithm and Cenju architecture, then dwell in some detail on the parallel programming aspects of Cenju.

关键词： parallel PROCESSING CIRCUIT SIMULATION LU-FACTORIZATION parallel programming

来源：评论

学校读者我要写书评

暂无评论

Introducing a Stream Processing Framework for Assessing parallel programming Interfaces 29

Introducing a Stream Processing Framework for Assessing Para...

引用

29th Euromicro International Conference on parallel, Distributed and Network-Based Processing (PDP)

作者： Garcia, Adriano Marques Griebler, Dalvan Fernandes, Luiz G. L. Schepke, Claudio Pontifical Catholic Univ Rio Grande do Sul PUCRS Sch Technol Porto Alegre RS Brazil Tres de Maio Maio Fac SETREM Lab Adv Res Cloud Comp LARCC Tres De Maio Brazil Fed Univ Pampa UNIPAMPA Alegrete Brazil

ISBN: (纸本)9781665414555

Stream Processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently computation. It can be done through Stream parallelism, which is still a challenging task and most reserved for experts. We introduce a Stream Processing framework for asse s s ing parallel programming Interfaces (PPIs). Our framework targets multi-core architectures and C++ stream processing applications, providing an API that abstracts the details of the stream operators of these applications. Therefore, users can easily identify all the basic operators and implement parallelism through different PPIs. In this paper, we present the proposed framework, implement three applications using its API, and show how it works, by using it to parallelize and evaluate the applications with the PPIs Intel TBB, FastFlow, and SPar. The performance results were consistent with the literature.

关键词： Benchmark Framework parallel programming Stream Processing

来源：评论

学校读者我要写书评

暂无评论

Dependability issues of parallel programming in measurement systems

Dependability issues of parallel programming in measurement ...

引用

SPIE-IEEE-PSP WILGA on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments

作者： Grochowski, Konrad Grabski, Waldemar Gawkowski, Piotr Derezinska, Anna Bluemke, Ilona Warsaw Univ Technol Inst Comp Sci Ul Nowowiejska 15-19 PL-00665 Warsaw Poland

ISBN: (数字)9781510622043

ISBN: (纸本)9781510622043

The paper presents the experiences of the design and development of an industrial measurement system. The architecture of the system is parallel and highly scalable. As studies show parallel systems are more error prone than sequential ones. Errors may be in synchronization or data sharing and can sometimes hinder processing within time limits acceptable for a measurement system. So, the performance problems may also be dependability ones. In this paper, the problems met during the implementation of a measurement system, as well as theirs solutions, are presented. One of them was unpredictable behavior of garbage collector which decreased system performance. Some deadlock situations have also been identified, which may occur if the measurement device (i.e. hardware) would experience a specific failure mode. It is shown, how substantially performance increase and effective and scalable code was achieved.

关键词： dependability parallel programming managed code garbage collection C#

来源：评论

学校读者我要写书评

暂无评论

Biomedical Data Analysis Based on parallel programming Technology Application for Computation Features' Effectiveness 5

Biomedical Data Analysis Based on Parallel Programming Techn...

引用

5th International Conference on Frontiers of Signal Processing (ICFSP)

作者： Ilyasova, Nataly Shirokanev, Alexander Paringer, Rustam Kupriyanov, Alexandr Samara Natl Res Univ RAS IPSI Branch FSRC Crystallog & Photon Samara Russia

ISBN: (纸本)9781728152585

This paper proposes a technology for large biomedical data analyzing based on CUDA computation. The technology was used to analyze a large set of fundus images used for diabetic retinopathy automatic diagnostics. A high-performance algorithm has been developed to calculate effective textural characteristics for medical image analysis. During the automatic image diagnostics, the following classes were distinguished: thin vessels, thick vessels, exudates and healthy areas. The mentioned algorithm's efficiency study was conducted with 500x500-1000x1000 pixels images using a 12x12 dimension window. The relationship between the developed algorithm's acceleration and data sizes was demonstrated. The study showed that the algorithm effectiveness can be depends of certain characteristics of the image, as its clarity, the shape of exudate zone, the variability of blood vessels, and the optic disc's location.

关键词： biomedical data fundus image effectiveness features parallel programming CUDA

来源：评论

学校读者我要写书评

暂无评论

Generic parallel programming for Massive Remote Sensing Data Processing

Generic Parallel Programming for Massive Remote Sensing Data...

引用

IEEE International Conference on Cluster Computing

作者： Ma, Yan Wang, Lizhe Liu, Dingsheng Liu, Peng Wang, Jun Tao, Jie Chinese Acad Sci Ctr Earth Observat & Digital Earth Beijing 100864 Peoples R China Univ Cent Florida Dept Elect Engn & Comp Sci Orlando FL USA Karlsruhe Inst Technol Steinbuch Ctr Comp Karlsruhe Germany

ISBN: (纸本)9781467324229

Remote Sensing (RS) data processing is characterized by massive remote sensing images and increasing amount of algorithms of higher complexity. parallel programming for data-intensive applications like massive remote sensing image processing on parallel systems is bound to be especially trivial and challenging. We propose a C++ template mechanism enabled generic parallel programming skeleton for these remote sensing applications in high performance clusters. It provides both programming templates for distributed RS data and generic parallel skeletons for RS algorithms. Through one-side communication primitives provided by MPI, the distributed RS data template could provide a global view of the big RS data whose sliced data blocks are scattered among the distributed memory of cluster nodes. Moreover, by data serialization and RMA (Remote Memory Access), the data templates could also offer a simple and effective way to distribute and communicate massive remote sensing data with complex data structures. Furthermore, the generic parallel skeletons implement the recurring patterns of computation, performance optimization and pass the user-defined sequential functions as parameters of templates for type genericity. With the implemented skeletons, Developers without extensive parallel computing technologies can implement efficient parallel remote sensing programs without concerning for parallel computing details. Through experiments on remote sensing applications, we confirmed that our templates were productive and efficient.

关键词： parallel programming generic programming data-intensive computing remote sensing image processing

来源：评论

学校读者我要写书评

暂无评论

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

引用

COMPUTER PHYSICS COMMUNICATIONS 2011年第1期182卷 266-269页

作者： Yang, Chao-Tung Huang, Chih-Lin Lin, Cheng-Fang Tunghai Univ Dept Comp Sci Taichung 40704 Taiwan

Nowadays NVIDIA s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications It provides several key abstractions - a hierarchy of thread blocks shared memory and barrier synchronization This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes In this paper we propose a parallel programming approach using hybrid CUDA OpenMP and MPI programming which partition loop iterations according to the number of C1060 CPU nodes in a CPU cluster which consists of one C1060 and one S1070 Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node (C) 2010 Elsevier B V All rights reserved

关键词： CUDA GPU MPI OpenMP Hybrid parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：