The need to increase performance while conserving energy lead to the emergence of multi-core processors. These processors provide a feasible option to improve performance of software applications by increasing the num...
详细信息
ISBN:
(纸本)9781424437511
The need to increase performance while conserving energy lead to the emergence of multi-core processors. These processors provide a feasible option to improve performance of software applications by increasing the number of cores, instead of relying on increased clock speed of a single core. The uptake of multi-core processors by hardware vendors present variety of challenges to the software community. In this context, it is important that messaging libraries based on the Message Passing Interface (MPI) standard support efficient inter-core communication. Typically processing cores of today's commercial multi-core processors share the main memory. As a result, it is vital to develop devices to exploit this. MPJ Express is our implementation of the MPI-like Java bindings. The software has mainly supported communication with two devices;the first is based on Java New I/O (NIO) and the second is based on Myrinet. In this paper, we present two shared memory implementations meant for providing efficient communication of multi-core and SMP clusters. The first implementation is pure Java and uses Java threads to exploit multiple cores. Each Java thread represents an MPI level OS process and communication between these threads is achieved using shared data structures. The second implementation is based on the System V (SysV) IPC API. Our goal is to achieve better communication performance than already existing devices based on Transmission Control Protocol (TCP) and Myrinet on SMP and multi-core platforms. Another design goal is that existing parallel applications must not be modified for this purpose, thus relieving application developers from extra efforts of porting their applications to such modern clusters. We have benchmarked our implementations and report that threads-based device performs the best on an Intel quad-core Xeon cluster.
The primary purpose of parallel streams in the recent release of Java 8 is to help Java programs make better use of multi-core processors for improved performance. However, in some cases, parallel streams can actually...
详细信息
ISBN:
(纸本)9783030011741;9783030011734
The primary purpose of parallel streams in the recent release of Java 8 is to help Java programs make better use of multi-core processors for improved performance. However, in some cases, parallel streams can actually perform considerably worse than ordinary sequential Java code. This paper presents a Map-Reduce parallel programming pattern for Java parallel streams that produces good speedup over sequential code. An important component of the Map-Reduce pattern is two optimizations: grouping and locality. Three parallel application programs are used to illustrate the Map-Reduce pattern and its optimizations: Histogram of an Image, Document Keyword Search, and Solution to a Differential Equation. A proposal is included for a new terminal stream operation for the Java language called MapReduce() that applies this pattern and its optimizations automatically.
Software reuse technology can improve the efficiency of program development greatly. A reusable Apla-Java component has been developed in the research of PAR (Partition and Recur) method and their tools. We have made ...
详细信息
ISBN:
(纸本)9780769534985
Software reuse technology can improve the efficiency of program development greatly. A reusable Apla-Java component has been developed in the research of PAR (Partition and Recur) method and their tools. We have made the most of reuse-driven software theory and the partial implementation theory for reference which ensure the accuracy of the components effectively. Apla-Java component is an important part of "Apla --> Java automatic conversion software". It can support multi-core programming for the implementation of parallel and concurrent mechanism. Some multi-core program examples have been developed based on the components. Experiments show that the approach of multicoreprogramming based on Apla-Java reusable components can greatly enhance the efficiency of development multi-core program.
The study of porous materials involves great importance for a vast number of industrial applications. In order to study some specific characteristics of materials, in-silico simulations can be employed. The particular...
详细信息
ISBN:
(纸本)9781479941162
The study of porous materials involves great importance for a vast number of industrial applications. In order to study some specific characteristics of materials, in-silico simulations can be employed. The particular simulation of pore networks described in this work finds its basis in the Dual Site-Bond Model (DSBM). Under this approach, a porous material is thought to be made of sites (cavities, bulges) interconnected to each other through bonds (throats, capillaries);while every site is connected to a number of bonds each bond is the link between two sites. At present, several computing algorithms have been implemented for the simulation of pore networks;nevertheless, only a few of these methods take into account the geometric restrictions that arise during the interconnection of a set of bonds to every site of the network. It is likely that introducing restrictions of this sort in the computing algorithms would lead to the implementation of more realistic pore networks. In this work, a sequential algorithm and its parallel computing version are proposed to construct pore networks, allowing geometrical restrictions among hollow entities. Our parallel approach uses OpenMP to create a set of threads (computing tasks) that work simultaneously on independent and random pore network regions. We discuss the obtained results.
Porous media simulation is an important contribution in the study of many physical phenomena. The NoMISS greedy algorithm outstands from the existing sequential algorithms for constructing a pore subnetwork, in a rela...
详细信息
ISBN:
(纸本)9780769548463
Porous media simulation is an important contribution in the study of many physical phenomena. The NoMISS greedy algorithm outstands from the existing sequential algorithms for constructing a pore subnetwork, in a relatively fast way. However, despite the NoMISS time reduction, there are still problems related to the required processing time when very large networks need to be studied. In this work, a non scalable parallel version of the NoMISS algorithm is presented, and a new approach is proposed to alleviate this issue;in both versions cluster cores work simultaneously on different porous subnetwork spaces. The first approach, named as Unbounded-NoMISS, allows the cores to go forward with the initialization of the porous subnetwork space, applying a balancing policy when a core needs more data. At the end, the cores require a sequential synchronization to finish the porous network construction. The second approach, named as Bounded-NoMISS, controls the porous subnetwork initialization by considering a site-size boundary, avoiding the final strong synchronization and improving considerably the scalability. The obtained results using a 125-core cluster are presented.
Phoenix MapReduce is a programming framework for multi-core systems that is used to automatically parallelize and schedule the programs based on the MapReduce framework. This paper presents a novel reconfigurable MapR...
详细信息
ISBN:
(纸本)9781479968909
Phoenix MapReduce is a programming framework for multi-core systems that is used to automatically parallelize and schedule the programs based on the MapReduce framework. This paper presents a novel reconfigurable MapReduce accelerator that can be augmented to multi-core SoCs and it can speedup the indexing and the processing of the MapReduce key-value pairs. The proposed architecture is implemented, mapped and evaluated to an all-programmable SoC with two embedded ARM cores (Zynq FPGA). Depending on the MapReduce application requirements, the user can dynamically reconfigure the FPGA with the appropriate version of the MapReduce accelerator. The performance evaluation shows that the proposed scheme can achieve up to 2.3x overall performance improvement in MapReduce applications.
Background: Fast and accurate automatic segmentation of skeletal muscle cell image is crucial for the diagnosis of muscle related diseases, which extremely reduces the labor-intensive manual annotation. Recently, seve...
详细信息
Background: Fast and accurate automatic segmentation of skeletal muscle cell image is crucial for the diagnosis of muscle related diseases, which extremely reduces the labor-intensive manual annotation. Recently, several methods have been presented for automatic muscle cell segmentation. However, most methods exhibit high model complexity and time cost, and they are not adaptive to large-scale images such as whole-slide scanned specimens. Methods: In this paper, we propose a novel distributed computing approach, which adopts both data and model parallel, for fast muscle cell segmentation. With a master-worker parallelism manner, the image data in the master is distributed onto multiple workers based on the Spark cloud computing platform. On each worker node, we first detect cell contours using a structured random forest (SRF) contour detector with fast parallel prediction and generate region candidates using a superpixel technique. Next, we propose a novel hierarchical tree based region selection algorithm for cell segmentation based on the conditional random field (CRF) algorithm. We divide the region selection algorithm into multiple sub-problems, which can be further parallelized using multi-core programming. Results: We test the performance of the proposed method on a large-scale haematoxylin and eosin (H&E) stained skeletal muscle image dataset. Compared with the standalone implementation, the proposed method achieves more than 10 times speed improvement on very large-scale muscle images containing hundreds to thousands of cells. Meanwhile, our proposed method produces high-quality segmentation results compared with several state-of-the-art methods. Conclusions: This paper presents a parallel muscle image segmentation method with both data and model parallelism on multiple machines. The parallel strategy exhibits high compatibility to our muscle segmentation framework. The proposed method achieves high-throughput effective cell segmentation on large-scale muscl
MapReduce is a widely used programming framework for the implementation of cloud computing application in data centers. This work presents a novel configurable hardware accelerator that is used to speed up the process...
详细信息
ISBN:
(纸本)9781450326711
MapReduce is a widely used programming framework for the implementation of cloud computing application in data centers. This work presents a novel configurable hardware accelerator that is used to speed up the processing of multi-core and cloud computing applications based on the MapReduce programming framework. The proposed MapReduce configurable accelerator is augmented to multi-core processors and it performs a fast indexing and accumulation of the key/value pairs based on an efficient memory architecture using Cuckoo hashing. The MapReduce accelerator consists of the memory buffers that store the key/value pairs, and the processing units that are used to accumulate the key's value sent from the processors. In essence, this accelerator is used to alleviate the processors from executing the Reduce tasks, and thus executing only the Map tasks and emitting the intermediate key/value pairs to the hardware acceleration unit that performs the Reduce operation. The number and the size of the keys that can be stored on the accelerator are configurable and can be configured based on the application requirements. The MapReduce accelerator has been implemented and mapped to a multi-core FPGA with embedded ARM processors (Xilinx Zynq FPGA) and has been integrated with the MapReduce programming framework under Linux. The performance evaluation shows that the proposed accelerator can achieve up to 1.8x system speedup of the MapReduce applications and hence reduce significantly the execution time of multi-core and cloud computing applications. (Action: "Supporting Postdoctoral Researchers", "Education and Lifelong Learning" Program (GSRT) and co-financed by the ESF and the Greek State.)
暂无评论