Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches hav...
详细信息
ISBN:
(纸本)9783642246685
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches have been introduced. Genetic algorithms can be used to search the vast space of all possible conformations of a protein in order to find its native structure. A framework for design of such algorithmsthat is both generic, easy to use and performs fast on distributed systems may help further development of genetic algorithm based approaches. We propose such a framework based on a parallel master-slave model which is implemented in C++ and Message Passing interface. We evaluated its performance on distributed systems with a different number of processors and achieved a linear acceleration in proportion to the number of processing units.
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat ...
详细信息
ISBN:
(纸本)9783642246494
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat handle large input data sizes. this paper proposes parallel implementations of multithreaded external sort and external hash join algorithms to accelerate IBM DB2, one of leading RDBMSs, using an IBM Power Edge of Network (IBM PowerEN (TM)) Peripheral Component Interconnect Express (PCIe) card as an accelerator. the preliminary results show that the proposed parallel implementation of the algorithms on PowerEN (TM) PCIe card can speed up the DB2 sort and join performance about two times.
An emotional agent software architecture for real-time mobile robotic applications has been developed. In order to allow the agent to undertake more dynamically constrained application problem solving, the processor c...
详细信息
ISBN:
(纸本)9783642246685
An emotional agent software architecture for real-time mobile robotic applications has been developed. In order to allow the agent to undertake more dynamically constrained application problem solving, the processor computation time should be reduced and the gained time is used for executing more complex processes. In this paper, the response time of the operating processes, in each attention cycle of the agent, is decreased by parallelizing the highly parallel processes of the architecture, namely, emotional contribution processes. the implementation of these processes has been evaluated in Field Programmable Gate Array (FPGA) and multicore processors.
Automation, computational time and cost are open subjects in microarray image processing. the present paper proposes image processing techniques together withtheir implementations in order to eliminate the shortcomin...
详细信息
ISBN:
(纸本)9781457714115
Automation, computational time and cost are open subjects in microarray image processing. the present paper proposes image processing techniques together withtheir implementations in order to eliminate the shortcomings of the existing software platforms for microarray image processing: user intervention, increased computational time and cost. thus, for each step of microarray image processing, application-specific hardware architectures are designed aiming algorithmsparallelization for fast processing. Computational time is estimated and compared with state of the art approaches. the proposed hardware architectures integrated inside microarray scanners deliver microarray image characteristics in an automated manner, excluding the need of an additional software platform. the FPGA technology was chosen for implementation, due to its parallel computation capabilities and ease of reconfiguration.
the Compute Unified Device Architecture (CUDA) is a new parallelprocessing platform making use of the unified shader design of the most current Graphics processing Units (GPUs) from NVIDIA. In this paper, we apply th...
详细信息
We focus on agent-based simulations where a large number of agents move in the space, obeying to some simple rules. Since such kind of simulations are computational intensive, it is challenging, for such a contest, to...
详细信息
ISBN:
(纸本)9780769543284
We focus on agent-based simulations where a large number of agents move in the space, obeying to some simple rules. Since such kind of simulations are computational intensive, it is challenging, for such a contest, to let the number of agents to grow and to increase the quality of the simulation. A fascinating way to answer to this need is by exploiting parallelarchitectures. In this paper, we present a novel distributed load balancing schema for a parallel implementation of such simulations. the purpose of such schema is to achieve an high scalability. Our approach to load balancing is designed to be lightweight and totally distributed: the calculations for the balancing take place at each computational step, and influences the successive step. To the best of our knowledge, our approach is the first distributed load balancing schema in this context. We present boththe design and the implementation that allowed us to perform a number of experiments, with up-to 1, 000, 000 agents. Tests show that, in spite of the fact that the load balancing algorithm is local, the workload distribution is balanced while the communication overhead is negligible.
the proceedings contain 4 papers. the topics discussed include: cache size in a cost model for heterogeneous skeletons;an efficient skew-insensitive algorithm for join processing on grid architectures;formally specify...
ISBN:
(纸本)9781450308625
the proceedings contain 4 papers. the topics discussed include: cache size in a cost model for heterogeneous skeletons;an efficient skew-insensitive algorithm for join processing on grid architectures;formally specifying and analyzing a parallel virtual machine for lazy functional languages using Maude;and type system for a safe execution of parallel programs in BSML.
In this work, we propose an efficient quasi-cyclic LDPC (QC-LDPC) decoder simulator which runs on graphics processing units (GPUs). We optimize the data structures of the messages used in the decoding process such tha...
详细信息
ISBN:
(纸本)9783642246494
In this work, we propose an efficient quasi-cyclic LDPC (QC-LDPC) decoder simulator which runs on graphics processing units (GPUs). We optimize the data structures of the messages used in the decoding process such that boththe read and write processes can be performed in a highly parallel manner by the GPUs. We also propose a highly efficient algorithm to convert the data structure of the messages from one form to another with very little latency. Finally, withthe use of a large number of cores in the GPU to perform the simple computations simultaneously, our GPU-based LDPC decoder is found to run at around 100 times faster than a CPU-based simulator.
Successful proof-of-concept laboratory experiments on cortically-controlled brain computer interface motivate continued development for neural prosthetic microsystems (NPMs). One of the research directions is to reali...
详细信息
ISBN:
(纸本)9781424441419
Successful proof-of-concept laboratory experiments on cortically-controlled brain computer interface motivate continued development for neural prosthetic microsystems (NPMs). One of the research directions is to realize realtime spike sorting processors (SSPs) on the NPM. the SSP detects the spikes, extracts the features, and then performs the classification algorithm in realtime in order to differentiate the spikes for the different firing neurons. Several architectures have been designed for the spike detection and feature extraction. However, the classification hardware is missing. To complete the SSP, a density-based hardware-oriented classification algorithm is proposed for hardware implementation. the traditional classification algorithms require a considerable memory space to store all the training features during the processing iteration, which results in a considerable power and area for the hardware. the proposed one is designed based on the density map of the spike features. the density map can be accumulated on-line withthe coming of the spike features. therefore the algorithm can save significant memory space, and is good for efficient hardware implementation.
An addition chain for a natural number x of n bits is a sequence of numbers a(0), a(1), ... , a(l), such that a(0) = 1, a(l) = x, and a(k) = a(i) + a(j) with 0 <= i, j < k <= l. the addition chain problem is ...
详细信息
ISBN:
(纸本)9783642246685
An addition chain for a natural number x of n bits is a sequence of numbers a(0), a(1), ... , a(l), such that a(0) = 1, a(l) = x, and a(k) = a(i) + a(j) with 0 <= i, j < k <= l. the addition chain problem is what is the minimal number of additions needed to compute X starting from 1? In this paper, we present a new parallel algorithm to generate a short addition chain for x. the algorithm has running time O(log(2) n) using polynomial number processors under EREW PRAM (exclusive read exclusive write parallel random access machine). the algorithm is faster than previous algorithms and is based on binary method.
暂无评论