the power efficiency of large-scale computing on multiprocessing systems is an important issue that interrelated to both of the hardware architectures and the software methodologies. Aiming to design power-efficient h...
详细信息
ISBN:
(纸本)9783642143892
the power efficiency of large-scale computing on multiprocessing systems is an important issue that interrelated to both of the hardware architectures and the software methodologies. Aiming to design power-efficient high performance program, we have measured the power consumption of large matrices multiplication on multi-core and GPU platform. Based on the obtained power characteristic values of each computing component, we abstract the energy estimations by incorporating physical power constrains from the hardware devices and analysis of the program execution behaviors. We optimize the matrices multiplication algorithm in order to improve its power performance, and the efficiency promotion has been finally validated by measuring the program execution.
this paper presents the design and development of a technique referred to as SIPPA - Secure Information processing with Privacy Assurance - for biometric data reconstruction. SIPPA enables a client/server model with t...
详细信息
ISBN:
(纸本)9781424475742
this paper presents the design and development of a technique referred to as SIPPA - Secure Information processing with Privacy Assurance - for biometric data reconstruction. SIPPA enables a client/server model withthe following two properties: (1) the client party can compare the similarity between his/her sample data withthe source data on the server side --- without each party revealing his/her data to another, nor to a third party. If the sample data is "sufficiently similar" to the source data, the client can reconstruct the source data by using only the sample data and some helper data with negligible overhead provided by the server. the main contributions of this paper are: (1) algorithmic steps of SIPPA and its relationship to privacy homomorphism, (2) a parallel SIPPA architecture, and (3) the realization of parallel SIPPA as a service component for BioAPI 2.0 framework using Java RMI technology. To demonstrate its potential application, we apply SIPPA to the reconstruction of biometric data, and more specifically, biometric face images represented in terms of linearized vectors.
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for gene...
详细信息
ISBN:
(纸本)9783642143892
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithmsthat fit to the characteristics of modern GPU-architecture. We present, a high-performance in-place implementation of Batcher's bitonic sorting networks for CUDA-enabled GPUs. We adapted}Atonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.
A real-time emotional architecture (RTEA) for building parallel robotic applications is presented. RTEA allows the application developer to focus in the design and implementation of the agent processes, because the ar...
详细信息
ISBN:
(纸本)9783642131356
A real-time emotional architecture (RTEA) for building parallel robotic applications is presented. RTEA allows the application developer to focus in the design and implementation of the agent processes, because the architecture itself solves, in an autonomous way the decision about the attention to be paid to each of these processes. From the functional point of view, an RTEA selects and adapts its objectives depending on its physical (actuators) and its mental (processing) capabilities. this characteristic makes the architecture a useful solution in such applications that have to deal with several simultaneous tasks, that has real-time constraints, and where the objectives are defined in a flexible way. From the viewpoint of the design and development of applications, RTEA defines its different entities as independent modules. this modularity facilitates the programmer the development of each part of the project. To control the processing capacity of the agent and to guarantee the fulfilment of the temporal constraints of the processes. RTEA has been implemented in a real-time kernel (rt-linux). Mobile robot Experiments have been carried out to show how emotional system influence the mental organisation of the robot when it performs navigational tasks under different environmental conditions.
Developing parallel or distributed applications is a hard task and it requires advanced algorithms, realistic modeling, efficient design tools, high-level programming abstractions, high-performance implementations, an...
详细信息
Co-clustering has been extensively used in varied applications because of its potential to discover latent local patterns that are otherwise unapparent by usual unsupervised algorithms such as k-means. Recently, a uni...
详细信息
ISBN:
(纸本)9783642131189
Co-clustering has been extensively used in varied applications because of its potential to discover latent local patterns that are otherwise unapparent by usual unsupervised algorithms such as k-means. Recently, a unified view of co-clustering algorithms, called Bregman co-clustering (BCC), provides a general framework that even contains several existing co-clustering algorithms, thus we expect to have more applications of this framework to varied data types. However, the amount of data collected from real-life application domains easily grows too big to fit in the main memory of a single processor machine. Accordingly, enhancing the scalability of BCC can be a critical challenge in practice. To address this and eventually enhance its potential for rapid deployment to wider applications with larger data, we parallelize all the twelve co-clustering algorithms in the BCC framework using message passing interface (MPI). In addition, we validate their scalability on eleven synthetic datasets as well as one real-life dataset, where we demonstrate their speedup performance in terms of varied parameter settings.
this paper analyses a PDE solver working on adaptive Cartesian grids. While a rigorous element-wise formulation of this solver offers great flexibility concerning dynamic adaptivity, and while it comes along with very...
详细信息
ISBN:
(纸本)9783642143892
this paper analyses a PDE solver working on adaptive Cartesian grids. While a rigorous element-wise formulation of this solver offers great flexibility concerning dynamic adaptivity, and while it comes along with very low memory requirements, the realisation's speed can not cope with codes working on patches of regular grids-in particular, if the latter deploy patches to several cores. Instead of composing a grid of regular patches, we suggest to identify regular patches throughout the recursive;element-wise grid traversal. Our code then unrolls the recursion for these regular grid blocks automatically, and it deploys their computations to several cores. It hence benefits from multicores on regular subdomains, but preserves its simple, element-wise character and its ability to handle arbitrary dynamic refinement and domain topology changes.
Overview and experimental comparative study of parallelalgorithms of asynchronous cellular at simulation is presented. the algorithms are tested for the model of physicochemical process of surface CO + O-2 reaction o...
详细信息
ISBN:
(纸本)9783642159787
Overview and experimental comparative study of parallelalgorithms of asynchronous cellular at simulation is presented. the algorithms are tested for the model of physicochemical process of surface CO + O-2 reaction over the supported Pd nanoparticles on different parallel computers. For testing we use shared memory computers, distributed memory computers (i.e. clusters), and graphical processing unit. Characterization of these algorithms in respect of methods of parallelism maintenance is given.
Empirical search is an emerging strategy used in systems like ATLAS, FFTW and SPIRAL to find the parameter values of the implementation that deliver near-optimal performance for a particular machine. However, this app...
详细信息
ISBN:
(纸本)9783642152764
Empirical search is an emerging strategy used in systems like ATLAS, FFTW and SPIRAL to find the parameter values of the implementation that deliver near-optimal performance for a particular machine. However, this approach has only proven successful for scientific kernels or serial symbolic sorting. Even commercial libraries like Intel MKL or IBM ESSL do not include parallel version of sorting routines. In this paper we study empirical search in the generation of parallel sorting routines for multi-core systems. parallel sorting presents new challenges that the relative performance of the algorithms depends not only on the characteristics of the architectures and input data, but also on the data partitioning schemes and thread interactions. We have studied parallel sorting algorithms including quick sort, cache-conscious radix sort, multi-way merge sort, sample sort and quick-radix sort, and have built a sorting library using empirical search and artificial neural network. Our results show that this sorting library could generate the best parallel sorting algorithms for different input sets on both x86 and SPARC multi-core architectures, with a peak speedup of 2.2x and 3.9x, respectively.
Smith-Waterman algorithm is a classic dynamic programming algorithm to solve the problem of biological sequence alignment. However, withthe rapid increment of the number of DNA and protein sequences, the originally s...
详细信息
ISBN:
(纸本)9783642131189
Smith-Waterman algorithm is a classic dynamic programming algorithm to solve the problem of biological sequence alignment. However, withthe rapid increment of the number of DNA and protein sequences, the originally sequential algorithm is very time consuming due to there existing the same computing task computed repeatedly on large-scale data. Today's CPU (graphics processor unit) consists of hundreds of processors, so it has a more powerful computation capacity than the current multicore CPU. And as the programmability of GPU improved continuously, using it to do generous purpose computing is becoming very popular. In order to accelerate sequence alignment, previous researchers use the parallelism of the anti-diagonal of similarity matrix to parallelize the Smith-Waterman algorithm on CPU. In this paper, we design a new parallel algorithm which exploits the parallelism of the column of similarity matrix to parallelize the Smith-Waterman algorithm on a heterogeneous system based on CPU and CPU. the experiment result shows that our new parallel algorithm is more efficient than that of previous, which takes full advantage of the features of boththe CPU and CPU and obtains approximately 37 times speedup compared withthe sequential algorithm named OSEARCH implemented on Intel dual-core E2140 processor.
暂无评论