the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246494
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246685
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
Existing multimodal summarization methods primarily focus on multimodal fusion to efficiently utilize the visual information for summarization. However, they fail to exploit the deep interaction between textual and vi...
详细信息
this paper introduces a number of modifications that allow for significant improvements of parallel LLL reduction. Experiments show that these modifications result in an increase of the speed-up by a factor of more th...
详细信息
ISBN:
(纸本)9783642246494
this paper introduces a number of modifications that allow for significant improvements of parallel LLL reduction. Experiments show that these modifications result in an increase of the speed-up by a factor of more than 1.35 for SVP challenge type lattice bases in comparing the new algorithm withthe state-of-the-art parallel LLL algorithm.
Memory-CPU single communication channel bottleneck of the von Neumann architecture is quickly stalling the growth of computer processors. A probable solution to this problem is to fuse processing and memory elements. ...
详细信息
ISBN:
(纸本)9783642246494
Memory-CPU single communication channel bottleneck of the von Neumann architecture is quickly stalling the growth of computer processors. A probable solution to this problem is to fuse processing and memory elements. A simple low latency single on-chip memory and processor cannot solve the problem as the fundamental channel bottleneck will still be there due to the logical splitting of processor and memory. this paper presents that a paradigm shift is possible by combining Arithmetic logic unit and Random Access Memory (ARAM) elements at bit level. this bit level modest ARAM is used to perform word level ALU instructions with minor modifications. this makes the ARAM cells capable of executing instructions in parallel. It is also asynchronous and hence reduces power consumption significantly. A CMOS implementation is presented that verifies the practicality of the proposed ARAM.
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat ...
详细信息
ISBN:
(纸本)9783642246494
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat handle large input data sizes. this paper proposes parallel implementations of multithreaded external sort and external hash join algorithms to accelerate IBM DB2, one of leading RDBMSs, using an IBM Power Edge of Network (IBM PowerEN (TM)) Peripheral Component Interconnect Express (PCIe) card as an accelerator. the preliminary results show that the proposed parallel implementation of the algorithms on PowerEN (TM) PCIe card can speed up the DB2 sort and join performance about two times.
CUDA is an architecture introduced by NVIDIA Corporation, which allows software developers to take advantage of GPU resources in order to increase the computational power. this paper presents an approach to accelerate...
详细信息
ISBN:
(纸本)9783642246494
CUDA is an architecture introduced by NVIDIA Corporation, which allows software developers to take advantage of GPU resources in order to increase the computational power. this paper presents an approach to accelerate the similarity searching of DNA and protein molecules through parallel alignments of their sequences withthe use of GPU and CUDA. In order to optimally align two biopolymer sequences, such as amino acid or nucleotide sequences, we employ the Smith-Waterman algorithm. We present the optimization steps leading to achieve a very good efficiency of our implementation on GPU and we compare results of efficiency tests with other known implementations. the results show that it is possible to search bioinformatics databases accurately within a reasonable time.
Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N-3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm...
详细信息
ISBN:
(纸本)9783642246494
Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N-3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N-2). three implementations that span a wide range of parallel hardware are developed. the first is based on shared-memory architecture, using the OpenMP programming model. the second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. the third implementation is based on the data parallel programming model mapped on Graphics processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470.
ica3pp 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings const...
详细信息
ISBN:
(数字)9789812792037
ISBN:
(纸本)9789810244811
ica3pp 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings constitute a well-defined set of innovative research papers in two broad areas of parallel and distributed computing: (1) architectures, algorithms and networks; (2) systems and applications.
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches hav...
详细信息
ISBN:
(纸本)9783642246685
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches have been introduced. Genetic algorithms can be used to search the vast space of all possible conformations of a protein in order to find its native structure. A framework for design of such algorithmsthat is both generic, easy to use and performs fast on distributed systems may help further development of genetic algorithm based approaches. We propose such a framework based on a parallel master-slave model which is implemented in C++ and Message Passing interface. We evaluated its performance on distributed systems with a different number of processors and achieved a linear acceleration in proportion to the number of processing units.
暂无评论