检索结果-内蒙古大学图书馆

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

作者： Christoforos Kachris Georgios Ch. Sirakoulis Dimitrios Soudris Democritus University of Thrace Xanthi Greece National Technical University of Athens Athens Greece

ISBN: (纸本)9781450326711

MapReduce is a widely used programming framework for the implementation of cloud computing application in data centers. This work presents a novel configurable hardware accelerator that is used to speed up the processing of multi-core and cloud computing applications based on the MapReduce programming framework. The proposed MapReduce configurable accelerator is augmented to multi-core processors and it performs a fast indexing and accumulation of the key/value pairs based on an efficient memory architecture using Cuckoo hashing. The MapReduce accelerator consists of the memory buffers that store the key/value pairs, and the processing units that are used to accumulate the key's value sent from the processors. In essence, this accelerator is used to alleviate the processors from executing the Reduce tasks, and thus executing only the Map tasks and emitting the intermediate key/value pairs to the hardware acceleration unit that performs the Reduce operation. The number and the size of the keys that can be stored on the accelerator are configurable and can be configured based on the application requirements. The MapReduce accelerator has been implemented and mapped to a multi-core FPGA with embedded ARM processors (Xilinx Zynq FPGA) and has been integrated with the MapReduce programming framework under Linux. The performance evaluation shows that the proposed accelerator can achieve up to 1.8x system speedup of the MapReduce applications and hence reduce significantly the execution time of multi-core and cloud computing applications. (Action: "Supporting Postdoctoral Researchers", "Education and Lifelong Learning" Program (GSRT) and co-financed by the ESF and the Greek State.)

关键词： cloud computing hardware accelerator fpga mapreduce multi-core programming reconfigurable computing

来源：评论

学校读者我要写书评

暂无评论

A Parallel programming Pattern based on Directed Acyclic Graph

A Parallel Programming Pattern based on Directed Acyclic Gra...

引用

International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2012)

作者： Meng, Zheng Lin, Ying Kang, Yan Yu, Qian Yunnan Univ Sch Software Kunming 650091 Peoples R China

ISBN: (纸本)9783037856529

With the development of computer technology, multi-core programming is now becoming hot issues. Based on directed acyclic graph, this paper gives definition of a number of executable operations and establishes a parallel programming pattern. Using verticies to represent tasks and edges to represent communication between vertex, this parallel programming pattern let the programmers easily to identify the available concurrency and expose it for use in the algorithm design. The proposed pattern can be used for large-scale static data batch processing in multi-core environments and can bring lots of convenience when deal with complex issues.

关键词： Parallel programming Pattern Directed Acyclic Graph multi-core programming

来源：评论

学校读者我要写书评

暂无评论

Case study: stereo vision experiments with multi-core software API on embedded MPSoC environments

引用

JOURNAL OF SUPERCOMPUTING 2012年第1期61卷 103-117页

作者： Li, Jia-Jhe Chen, Chung-Kai Wu, Tung-Yu Lee, Jenq Kuen Natl Tsing Hua Univ Dept Comp Sci Programming Language Lab Hsinchu 30043 Taiwan Realtek Semicond Corp Hsinchu Taiwan

Markov random field models provide a robust formulation of low-level vision problems. Among all these problems, stereo vision remains the most investigated field. The belief propagation (BP) method provides accurate result in stereo vision problems. However, the algorithm remains slow for practical use. This paper describes a case study on the parallelization of belief propagation for stereo matching using the "multi-core Software APIs" (MSA) on embedded MPSoC environments. MSA is a library-based middleware providing an asynchronous remote procedure call (RPC) mechanism. It supplies a function-offloading programming model to hide the underlying interprocessor communication and configuration detail from programmers. Furthermore, MSA provides a set of stream-specific APIs for supporting a streaming-function remoting mechanism on heterogeneous multi-core architectures. Our experiments shows that the BP method for stereo matching can be adapted from a single core program to a multi-core one for embedded MPSoC environments rapidly.

关键词： multi-core programming Remote procedure call Streaming Stereo matching

来源：评论

学校读者我要写书评

暂无评论

Pore Networks Simulation with Parallel Greedy Algorithms 12

Pore Networks Simulation with Parallel Greedy Algorithms

引用

16th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications

作者： Roman-Alonso, G. Boukerche, A. Matadamas-Hernandez, J. Castro-Garcia, M. A. Univ Ottawa PARADISE Res Lab SITE Ottawa ON K1N 6N5 Canada Univ Autonoma Metropolitana Mexico City DF Mexico

ISBN: (纸本)9780769548463

Porous media simulation is an important contribution in the study of many physical phenomena. The NoMISS greedy algorithm outstands from the existing sequential algorithms for constructing a pore subnetwork, in a relatively fast way. However, despite the NoMISS time reduction, there are still problems related to the required processing time when very large networks need to be studied. In this work, a non scalable parallel version of the NoMISS algorithm is presented, and a new approach is proposed to alleviate this issue;in both versions cluster cores work simultaneously on different porous subnetwork spaces. The first approach, named as Unbounded-NoMISS, allows the cores to go forward with the initialization of the porous subnetwork space, applying a balancing policy when a core needs more data. At the end, the cores require a sequential synchronization to finish the porous network construction. The second approach, named as Bounded-NoMISS, controls the porous subnetwork initialization by considering a site-size boundary, avoiding the final strong synchronization and improving considerably the scalability. The obtained results using a 125-core cluster are presented.

关键词： Parallel and Distributed Simulation Parallel Scientific Applications Dynamic Data Distribution Cubic Pore Networks Dual Site-Bond Model multi-core programming

来源：评论

学校读者我要写书评

暂无评论

Towards Efficient Shared Memory Communications in MPJ Express

Towards Efficient Shared Memory Communications in MPJ Expres...

引用

23rd IEEE International Parallel and Distributed Processing Symposium

作者： Shafi, Aamir Manzoor, Jawad Natl Univ Sci & Technol Sch Elect Engn & Comp Sci Rawalpindi Pakistan

ISBN: (纸本)9781424437511

The need to increase performance while conserving energy lead to the emergence of multi-core processors. These processors provide a feasible option to improve performance of software applications by increasing the number of cores, instead of relying on increased clock speed of a single core. The uptake of multi-core processors by hardware vendors present variety of challenges to the software community. In this context, it is important that messaging libraries based on the Message Passing Interface (MPI) standard support efficient inter-core communication. Typically processing cores of today's commercial multi-core processors share the main memory. As a result, it is vital to develop devices to exploit this. MPJ Express is our implementation of the MPI-like Java bindings. The software has mainly supported communication with two devices;the first is based on Java New I/O (NIO) and the second is based on Myrinet. In this paper, we present two shared memory implementations meant for providing efficient communication of multi-core and SMP clusters. The first implementation is pure Java and uses Java threads to exploit multiple cores. Each Java thread represents an MPI level OS process and communication between these threads is achieved using shared data structures. The second implementation is based on the System V (SysV) IPC API. Our goal is to achieve better communication performance than already existing devices based on Transmission Control Protocol (TCP) and Myrinet on SMP and multi-core platforms. Another design goal is that existing parallel applications must not be modified for this purpose, thus relieving application developers from extra efforts of porting their applications to such modern clusters. We have benchmarked our implementations and report that threads-based device performs the best on an Intel quad-core Xeon cluster.

关键词： multi-core programming Shared Memory Communications Java HPC MPJ Express

来源：评论

学校读者我要写书评

暂无评论

Performance of a Lattice Quantum Chromodynamics kernel on the Cell processor

引用

COMPUTER PHYSICS COMMUNICATIONS 2008年第9期179卷 642-646页

作者： Spray, J. Hill, J. Trew, A. Univ Edinburgh EPCC Edinburgh EH9 3JY Midlothian Scotland

The implementation of a proof-of-concept Lattice Quantum Chromodynamics kernel on the Cell processor is described in detail, illustrating issues encountered in the porting process. The resulting code performs up to 45 GFlop/s per socket (without inter-node parallel communications), indicating that the Cell processor is likely to be a good platform for future Lattice QCD calculations. (C) 2008 Elsevier B.V. All rights reserved.

关键词： Cell processor multi-core programming Lattice QCD

来源：评论

学校读者我要写书评

暂无评论

A High Performance multifrontal Code for Linear Solution of Structures Using multi-core Microprocessors

引用

Tsinghua Science and Technology 2008年第S1期13卷 34-39页

作者： Efe Guney Kenneth Will Computer Aided Structural Engineering Center School of Civil and Environmental Engineering Georgia Institute of Technology

A multifrontal code is introduced for the efficient solution of the linear system of equations arising from the analysis of structures. The factorization phase is reduced into a series of interleaved element assembly and dense matrix operations for which the BLAS3 kernels are used. A similar approach is generalized for the forward and back substitution phases for the efficient solution of structures having multiple load conditions. The program performs all assembly and solution steps in parallel. Examples are presented which demonstrate the code’s performance on single and dual core processor computers.

关键词： multifrontal method Cholesky decomposition high performance computing finite element method multi-core programming BLAS3 parallel computing

来源：评论

学校读者我要写书评

暂无评论

The Research and Application of Apla-Java Reusable Components

The Research and Application of Apla-Java Reusable Component...

引用

International Symposium on Computer Science and Computational Technology

作者： Jie Anquan Wan Lan Hua Zhizhang Xue Jinyun Jiangxi Normal Univ Coll Comp Informat & Engn Nanchang 330022 Jiangxi Peoples R China

ISBN: (纸本)9780769534985

Software reuse technology can improve the efficiency of program development greatly. A reusable Apla-Java component has been developed in the research of PAR (Partition and Recur) method and their tools. We have made the most of reuse-driven software theory and the partial implementation theory for reference which ensure the accuracy of the components effectively. Apla-Java component is an important part of "Apla --> Java automatic conversion software". It can support multi-core programming for the implementation of parallel and concurrent mechanism. Some multi-core program examples have been developed based on the components. Experiments show that the approach of multicore programming based on Apla-Java reusable components can greatly enhance the efficiency of development multi-core program.

关键词： PAR method Apla-Java reusable components parallel multi-core programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：