检索结果-内蒙古大学图书馆

5th IEEE International Conference on Software Engineering and Service Science (ICSESS)

作者： Hei, Xinhong Zhang, Jinlong Wang, Bin Jin, Haiyan Giacaman, Nasser Xian Univ Technol Sch Engn & Comp Sci Xian Shaanxi Provinc Peoples R China Shaanxi Key Lab Network Comp & Secur Technol Xian Shaanxi Provinc Peoples R China Univ Auckland Dept Elect & Comp Engn Auckland 1 New Zealand

ISBN: (纸本)9781479932795

In order to reduce the complexity of traditional multithreaded parallel programming, this paper explores a new task-based parallel programming using the Microsoft. NET Task parallel Library (TPL). Firstly, this paper proposes a custom data partitioning optimization method to achieve an efficient data parallelism, and applies it to the matrix multiplication. The result of the application supports the custom data partitioning optimization method. Then we develop a task parallel application: Image Blender, and this application explains the efficiency and pitfall aspects associated with task parallelism. Finally, the paper analyzes the performance of our applications. Experiments results show that TPL can dramatically alleviate programmer burden and boost the performance of programs with its task-based parallel programming mechanism.

关键词： parallel programming Task-based TPL Data parallelism Task parallelism

来源：评论

学校读者我要写书评

暂无评论

Cache Aware Dynamics Data Layout for Efficient Shared Memory parallelisation of EUROPLEXUS

引用

Procedia Computer Science 2016年 80卷 1083-1092页

作者： Marwa Sridi Bruno Raffin Vincent Faucher CEA DEN DANS DM2S SEMT DYN F-91191 Gif sur Yvette France University Grenoble Alpes INRIA France CEA DEN Cadarache DTN/Dir F-13108 St Paul lez Durance France

parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core shared memory node. We propose to have each thread gather the data it needs for processing a given iteration range, before to actually advance the computation by one time step on this range. This lazy cache aware layout construction enables to keep the original data structure and leads to very localised code modifications. We show that this approach can improve the execution time by up to 40% when the task size is set to have the data fit in the L2 cache.

关键词： EUROPLEXUS Shared Memory Cache-aware Data Layout parallel programming

来源：评论

学校读者我要写书评

暂无评论

HPCmatlab: A Framework for Fast Prototyping of parallel Applications in Matlab

引用

Procedia Computer Science 2016年 80卷 1461-1472页

作者： Xinchen Guo Mukul Dave Mohamed Sayeed ASU Research Computing Arizona State University Tempe Arizona U.S.

The HPCmatlab framework has been developed for Distributed Memory programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrappers. Point-to-point, collective as well as one-sided communication is supported. Benchmarking results show better performance than the Mathworks Distributed Computing Server. HPCmatlab has been used to successfully parallelize and speed up Matlab applications developed for scientific computing. The application results show good scalability, while preserving the ease of programmability. HPCmatlab also enables shared memory programming using Pthreads and parallel I/O using the ADIOS package.

关键词： parallel programming Message Passing Interface Matlab MEX Functions parallel I/O

来源：评论

学校读者我要写书评

暂无评论

Implementation of Image Enhancement Algorithms and Recursive Ray Tracing using CUDA

引用

Procedia Computer Science 2016年 79卷 516-524页

作者： Mr. Diptarup Saha Mr. Karan Darji Narendra Patel Darshak Thakore Birla Vishvakarama Mahavidyalaya Vallabh Vidyanagar Anand Gujarat India

This paper intends to achieve high performance in terms of time by implementing various time consuming application on NVIDIA Graphics Processing Unit (GPU) by using parallel programming model NVIDIA Compute Unified Device Architecture (CUDA). NVIDIA CUDA provides platform for developing parallel applications on NVIDIA GPUs. So it gives developers a platform to build high-end parallel processing applications. This paper implements various image processing algorithms on both Central Processing Unit (CPU) and GPU. Implemented point-to-point image processing algorithms are brightening filter, darkening filter, negative filter and RGB to Grayscale filter. Along with various convolution algorithms that consider value of its neighboring pixels are also implemented. Implemented convolution algorithms are sobel filter for edge detection, low pass filter and high pass filter. Performance analysis of the implemented image processing algorithms is done on both CPU and GPU. Analysis is made on images of resolution 3000 X 3000. Color-ed images are used for point-to-point pixel processing algorithms. Grayscale images are used for all convolution algorithms. Performance analysis done for point-to-point processing algorithms by varying number of threads per block. Recursive ray tracing is also implemented on GPU, and found performance gain compare to serial algorithm run on CPU.

关键词： CUDA Image Processing NVIDIA GPU parallel programming

来源：评论

学校读者我要写书评

暂无评论

Cloud-based Design and Virtual Prototyping Environment for Embedded Systems

引用

INTERNATIONAL JOURNAL OF ONLINE ENGINEERING 2016年第9期12卷 52-60页

作者： Werner, S. Lauber, A. Koedam, M. Becker, J. Sax, E. Goossens, K. Karlsruhe Inst Technol Karlsruhe Germany Eindhoven Univ Technol TUE Eindhoven Netherlands

The design and test of Multi-Processor System-on-Chips (MPSoCs) and development of distributed applications and/or operating systems executed on those hardware platforms is one of the biggest challenges in today's system design. This applies in particular when short time-to-market constraints impose serious limitations on the exploration of the design space. The use of virtual platforms can help in decreasing the development and test cycles. In this paper, we present a cloud-based environment supporting the user in designing heterogeneous MPSoCs and developing distributed applications. Therefore, the design environment generates virtual platforms automatically allowing fast prototyping cycles especially in the software development process, and exports the design to a hardware flow synthesizing compatible FPGA designs. The extension of the peripheral models with debug information supports the developer during test and debug cycles and avoids the need of adding special debug codes in the application. This improves the readability, portability and maintainability of produced software. Additionally, this paper presents the benefits of using cloud-based design environments in engineers' trainings and educations. Therefore, the framework supports testing the system including complex software stacks with prerecorded data or testbenches.

关键词： Rapid Prototyping Virtual Platform parallel programming cloud-based services OVP System Level Design Simulation

来源：评论

学校读者我要写书评

暂无评论

Pragmatic performance portability with OpenMP 4.x

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2016年 9903 LNCS卷 253-267页

作者： Martineau, Matt Price, James McIntosh-Smith, Simon Gaudin, Wayne Merchant Venturers Building University of Bristol Bristol United Kingdom UK Atomic Weapons Establishment Aldermaston United Kingdom

ISBN: (纸本)9783319455495

In this paper we investigate the current compiler technologies supporting OpenMP 4.x features targeting a range of devices, in particular, the Cray compiler 8.5.0 targeting an Intel Xeon Broadwell and NVIDIA K20x, IBM’s OpenMP 4.5 Clang branch (clang-ykt) targeting an NVIDIA K20x, the Intel compiler 16 targeting an Intel Xeon Phi Knights Landing, and GCC 6.1 targeting an AMD APU. We outline the mechanisms that they use to map the OpenMP model onto their target architectures, and conduct performance testing with a number of representative data parallel kernels. Following this we present a discussion about the current state of play in terms of performance portability and propose some straightforward guidelines for writing performance portable code, derived from our observations. At the time of writing, developers will likely have to rely on the pre-processor for certain kernels to achieve functional portability, but we expect that future homogenisation of required directives between compilers and architectures is feasible. © Springer International Publishing Switzerland 2016.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Introducing high performance computing to undergraduate students

引用

Computers in Education Journal 2016年第4期16卷 104-112页

作者： Cui, Suxia Wang, Yonghui Li, Lin Peng, Xiaobo Yalvac, Bugrahan Engineering Department Prairie View A and M University United States Computer Science Department Prairie View A and M University United States Dept. of Teaching Learning and Culture Texas A and M University College Station United States

Recently, President Obama issued an Executive Order to ensure the United States' leadership in computing. Necessary hardware and software design skills should be introduced into university curricula. Computing has been advanced to High Performance Computing (HPC) throughout the past decades. However, undergraduate students are still lacking of experience in how HPC functions especially in minority-serving institutions, because our current computing curricula do not adequately cover HPC contents. To address this problem, a team of faculty members have obtained external funding supports to improve undergraduate computing education through enhanced courses and research opportunities. The goal is to incorporate HPC concepts and training across the computing curricula in multiple disciplines in order to motivate students' interests in computing and improve their problem-solving skills. This three-year project has already finished the second year of implementation. During the first year, a diverse teaching environment was established, including a HPC cluster and embedded HPC platforms. Both platforms supported students' learning and research in parallel programming, embedded systems design, and data cloud. In the second project year, several courses were revised or developed across three departments: Electrical and Computer Engineering, Computer Science, and Engineering Technology. New course materials integrating the parallel and distributed computing concepts were developed and offered to undergraduate students. Project-based learning was introduced into classroom. More advanced concepts, such as computer vision and machine learning were explored by undergraduate students. At the same time, the research results were disseminated in junior and senior level courses. Faculty members applied effective pedagogy to teach new generation computing. For all the classes involved in this project, student surveys were collected to guide future project implementation. This article s

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

HiPro-CodeGen: Automatic programming for parallel numerical simulations 9

HiPro-CodeGen: Automatic programming for parallel numerical ...

引用

9th International Conference on Software Engineering and Applications, ICSOFT-EA 2014

作者： Li, Liao Cuiping, Jing Wei, Wang Aiqing, Zhang Zhang, Yang Institute of Applied Physics and Computational Mathematics No. 2 East Fenghao Road Beijing China

ISBN: (纸本)9789897580369

HiPro-CodeGen is a code generation engine designed for numerical simulation development. Its central objective is to produce a parallel software framework with standard structure for an application developed on JASMIN, a domain-specific computational framework. The unique parallel part and all interfaces of the application are generated and implementation of sequential subroutines is the only part of the code left to be written manually for a programmer. The design and implementation of the code generation engine is introduced which combines numerical mathematics with component-based programming to create ontological models for parallel simulations. A hybrid programming method is proposed on the work mechanism of the engine which combines graphical and textual approaches to hide parallel programming and object-oriented programming from developers. A real application is presented to show the effectiveness and efficiency of the engine.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Danse-doigts, a fine motor game

Modelling, Measurement and Control C

引用

Modelling, Measurement and Control C 2016年第2期77卷 182-192页

作者： Susini, Jean-Ferdy Pons, Olivier Guedin, Nolwenn Thevenot, Catherine CNAM CÉDRIC 292 rue Saint-Martin Paris Cédex 0375141 France FPSE UNIGE 40 bd du Pont d'Arve Genève 41205 Switzerland IP UNIL Géopolis Lausanne1015 Switzerland

This paper describes the design, implementation and testing of "Danse-doigts", an edutainment therapeutic application for hemiplegic children. The objective of this program is twofold. Firstly, to allow them to train their fine motor skills on tablet. Secondly, to study the effect of this training on their numerical performance (counting, calculation...). The target population and the objective of evaluating numerical skills influenced the design. The software was developed using standard web technologies but is based on a new parallel programming library written in JavaScript. Applications and libraries are free of charge and easy to install on most tablets. © 2016, AMSE Press. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Ab initio protein structure prediction using GPU computing

引用

Perspectives in Science 2016年 8卷 645-647页

作者： Sandhya Parasnath Dubey N. Gopalakrishna Kini M. Sathish Kumar S. Balaji Dept. of CSE Manipal Institute of Technology Manipal University India Dept. of ECE Manipal Institute of Technology Manipal University India Dept. of Biotechnology Manipal Institute of Technology Manipal University India

Graphics processing unit (GPU) accelerated computing has pioneered a new direction of research for various combinatorial optimization problems. One such problem which requires huge computation is protein structure prediction (PSP). PSP is NP -complete problem. Computational prediction of protein native structure from its primary amino acid sequence is termed as ab initio PSP problem. Till date, wet lab experiments conducted on PSP indicate that existing methods take lots of experimentation time and expensive. As a consequence, only 1% of the sequence's structures are known. This work presents a parallel programming approach with GPU computing for PSP using 2D triangular hydrophobic-polar (HP) lattice model. The implementation of proposed approach is tested on the set of HP benchmark sequence of a length ranging from 25 to 100. The experimental result shows that the proposed approach has significantly improved the performance of prediction with immense drop in computation time.

关键词： Protein structure prediction Hydrophobic-polar model NP -problem parallel programming Graphics processing unit Evolutionary programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：