In recent years, the proliferation of highly dynamic graphstructured data streams fueled the demand for real-time data analytics. For instance, detecting recent trends in social networks enables new applications in ar...
详细信息
the increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallelarchitectures compos...
详细信息
ISBN:
(纸本)9781509043200
the increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallelarchitectures composed by multithreading and multiprocessor hardware, our current efforts involve the design and validation of highly parallelalgorithmsthat efficently explore the characteristics of such architectures. In this paper, we propose an automatic tuning methodology to easily exploit multicore, multi- GPU and coprocessor systems. We present an optimization of an algorithm for solving triangular systems (TRSM), based on block decomposition and asynchronous task assignment, and discuss some results.
the LogP model was used to measure the effects of latency, occupancy and bandwidth on distributed memory multiprocessors. the idea was to characterize distributed memory multiprocessor using these key parameters, stud...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
the LogP model was used to measure the effects of latency, occupancy and bandwidth on distributed memory multiprocessors. the idea was to characterize distributed memory multiprocessor using these key parameters, studying their impacts on performance in simulation environments. this work proposes a new model, based on LogP, that describes the impacts on performance of applications executing on a heterogeneous cluster. this model can be used, in a near future, to help choose the best way to split a parallel application to be executed on this architecture. the model considers that a heterogeneous cluster is composed by distinct types of processors, accelerators and networks.
Multiple sequence alignment (MSA) is critical in several areas of science, especially in bioinformatics. Expressive advances have been developed in MSA and many methods, algorithms and tools have been proposed for it....
详细信息
Multiple sequence alignment (MSA) is critical in several areas of science, especially in bioinformatics. Expressive advances have been developed in MSA and many methods, algorithms and tools have been proposed for it. Since the MSA is an NP-hard problem, efforts have led to the emergence of heuristics to solve it. More recently, heuristics based on progressive alignment have highlighted due to the quality of the alignment and relatively good performance. Despite significant advances, MSA remains a time-consuming task and parallel solutions have been investigated. We propose a novel algorithm for solving MSA based on progressive alignment using cluster of GPUs. Our experimental results showed encouraging speedups for instances containing sequences ranging in length between 60 and 10k.
It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have ...
详细信息
ISBN:
(纸本)9781509052523
It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have been proposed and implemented using sequential computing, alternative parallel solutions provide more suitable and high performance solutions to the same problems. In this paper, three parallelization strategies are proposed and implemented for a dynamic programming based cloud smoothing application, using both shared memory and non-shared memory approaches. the experiments are performed on NVIDIA GeForce GT750m and Tesla K20m, two GPU accelerators of Kepler architecture. Detailed performance analysis is presented on partition granularity at block and thread levels, memory access efficiency and computational complexity. the evaluations described show high approximation of results with high efficiency in the parallel implementations, and these strategies can be adopted in similar data analysis and processing applications.
Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow compa...
详细信息
ISBN:
(纸本)9781450363884
Design of next generation computer systems should be supported by simulation infrastructure that must achieve a few contradictory goals such as fast execution time, high accuracy, and enough flexibility to allow comparison between large numbers of possible design points. Most existing architecture level simulators are designed to be flexible and to execute the code in parallel for greater efficiency, but at the cost of scarified *** paper presents the ScaleSimulator simulation environment, which is based on a new design methodology whose goal is to achieve near cycle accuracy while still being flexible enough to simulate many different future system architectures and efficient enough to run meaningful workloads. We achieve these goals by making the parallelism a first-class citizen in our methodology. thus, this paper focuses mainly on the ScaleSimulator design points that enable better parallel execution while maintaining the scalability and cycle accuracy of a simulated *** paper indicates that the new proposed ScaleSimulator tool can (1) efficiently parallelize the execution of a cycle-accurate architecture simulator, (2) efficiently simulate complex architectures (e.g., out-of-order CPU pipeline, cache coherency protocol, and network) and massive parallel systems, and (3) use meaningful workloads, such as full simulation of OLTP benchmarks, to examine future architectural choices.
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithmsthat are only ap...
详细信息
After the emergence of the new High Efficiency Video Coding standard, several strategies have been followed in order to take advantage of the parallel features available in it. Many of the parallelization approaches i...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
After the emergence of the new High Efficiency Video Coding standard, several strategies have been followed in order to take advantage of the parallel features available in it. Many of the parallelization approaches in the literature have been performed in the decoder side, aiming at achieving real-time decoding. However, the most complex part of the HEVC codec is the encoding side. In this paper, we perform a comparative analysis of two parallelization proposals. One of them is based on tiles, employing shared memory architectures and the other one is based on Groups Of Pictures, employing distributed shared memory architectures. the results show that good speed-ups are obtained for the tile-based proposal, especially for high resolution video sequences, but the scalability decreases for low resolution video sequences. the GOP-based proposal outperforms the tile-based proposal when the number of processes increases. this benefit grows up when low resolution video sequences are compressed.
An automated development of a parallel distributed dynamically scalable fault-tolerant system for processing large amount of streaming data is performed. the system is based on the framework for distributed computing ...
详细信息
暂无评论