the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246494
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246685
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
Existing multimodal summarization methods primarily focus on multimodal fusion to efficiently utilize the visual information for summarization. However, they fail to exploit the deep interaction between textual and vi...
详细信息
this paper introduces a number of modifications that allow for significant improvements of parallel LLL reduction. Experiments show that these modifications result in an increase of the speed-up by a factor of more th...
详细信息
ISBN:
(纸本)9783642246494
this paper introduces a number of modifications that allow for significant improvements of parallel LLL reduction. Experiments show that these modifications result in an increase of the speed-up by a factor of more than 1.35 for SVP challenge type lattice bases in comparing the new algorithm withthe state-of-the-art parallel LLL algorithm.
Memory-CPU single communication channel bottleneck of the von Neumann architecture is quickly stalling the growth of computer processors. A probable solution to this problem is to fuse processing and memory elements. ...
详细信息
ISBN:
(纸本)9783642246494
Memory-CPU single communication channel bottleneck of the von Neumann architecture is quickly stalling the growth of computer processors. A probable solution to this problem is to fuse processing and memory elements. A simple low latency single on-chip memory and processor cannot solve the problem as the fundamental channel bottleneck will still be there due to the logical splitting of processor and memory. this paper presents that a paradigm shift is possible by combining Arithmetic logic unit and Random Access Memory (ARAM) elements at bit level. this bit level modest ARAM is used to perform word level ALU instructions with minor modifications. this makes the ARAM cells capable of executing instructions in parallel. It is also asynchronous and hence reduces power consumption significantly. A CMOS implementation is presented that verifies the practicality of the proposed ARAM.
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat ...
详细信息
ISBN:
(纸本)9783642246494
In a commercial Relational Database Management System (RDBMS), sort and join are the most demanding operations, and it is quite beneficial to improve the performance of external sort and external join algorithmsthat handle large input data sizes. this paper proposes parallel implementations of multithreaded external sort and external hash join algorithms to accelerate IBM DB2, one of leading RDBMSs, using an IBM Power Edge of Network (IBM PowerEN (TM)) Peripheral Component Interconnect Express (PCIe) card as an accelerator. the preliminary results show that the proposed parallel implementation of the algorithms on PowerEN (TM) PCIe card can speed up the DB2 sort and join performance about two times.
CUDA is an architecture introduced by NVIDIA Corporation, which allows software developers to take advantage of GPU resources in order to increase the computational power. this paper presents an approach to accelerate...
详细信息
ISBN:
(纸本)9783642246494
CUDA is an architecture introduced by NVIDIA Corporation, which allows software developers to take advantage of GPU resources in order to increase the computational power. this paper presents an approach to accelerate the similarity searching of DNA and protein molecules through parallel alignments of their sequences withthe use of GPU and CUDA. In order to optimally align two biopolymer sequences, such as amino acid or nucleotide sequences, we employ the Smith-Waterman algorithm. We present the optimization steps leading to achieve a very good efficiency of our implementation on GPU and we compare results of efficiency tests with other known implementations. the results show that it is possible to search bioinformatics databases accurately within a reasonable time.
Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N-3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm...
详细信息
ISBN:
(纸本)9783642246494
Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N-3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N-2). three implementations that span a wide range of parallel hardware are developed. the first is based on shared-memory architecture, using the OpenMP programming model. the second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. the third implementation is based on the data parallel programming model mapped on Graphics processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470.
ica3pp 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings const...
详细信息
ISBN:
(数字)9789812792037
ISBN:
(纸本)9789810244811
ica3pp 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings constitute a well-defined set of innovative research papers in two broad areas of parallel and distributed computing: (1) architectures, algorithms and networks; (2) systems and applications.
In parallel programs concurrency bugs are often caused by unsynchronized accesses to shared memory locations, which are called data races. In order to support programmers in writing correct parallel programs, it is th...
详细信息
ISBN:
(纸本)9783642246494
In parallel programs concurrency bugs are often caused by unsynchronized accesses to shared memory locations, which are called data races. In order to support programmers in writing correct parallel programs, it is therefore highly desired to have tools on hand that automatically detect such data races. Today, most of these tools only consider unsynchronized read and write operations on a single memory location. Concurrency bugs that involve multiple accesses on a set of correlated variables may be completely missed. Tools may overwhelm programmers with data races on various memory locations, without noticing that the locations are correlated. In this paper, we propose a novel approach to data race detection that automatically infers sets of correlated variables and logical operations by analyzing data and control dependencies. For data race detection itself, we combine a modified version of the lockset algorithm with happens-before analysis providing the first hybrid, dynamic race detector for correlated variables. We implemented our approach on top of the Valgrind, a framework for dynamic binary instrumentation. Our evaluation confirmed that we can catch data races missed by existing detectors and provide additional information for correct bug fixing.
暂无评论