the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal ...
ISBN:
(纸本)9783642330643
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal attacks in distributed wireless sensor networks;optimization of a short-range proximity effect correction algorithm in e-beam lithography using GPGPUs;vectorized algorithms for Quadtree construction and descent;an optimal parallel prefix-sums algorithm on the memory machine models for GPUs;enhancing the performance of a distributed mobile computing environment by topology construction;maintaining consistency in software transactional memory through dynamic versioning tuning;a new low latency parallel turbo decoder employing parallel phase decoding method;high-performance matrix multiply on a massively multithreaded Fiteng1000 processor;and on construction of Cloud IaaS for VM live migration using KVM and OpenNebula.
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal ...
ISBN:
(纸本)9783642330773
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal attacks in distributed wireless sensor networks;optimization of a short-range proximity effect correction algorithm in e-beam lithography using GPGPUs;vectorized algorithms for Quadtree construction and descent;an optimal parallel prefix-sums algorithm on the memory machine models for GPUs;enhancing the performance of a distributed mobile computing environment by topology construction;maintaining consistency in software transactional memory through dynamic versioning tuning;a new low latency parallel turbo decoder employing parallel phase decoding method;high-performance matrix multiply on a massively multithreaded Fiteng1000 processor;and on construction of Cloud IaaS for VM live migration using KVM and OpenNebula.
In this paper, we propose a parallel implementation of the number-theoretic transform (NTT) on GPU clusters. the butterfly operation of the NTT can be performed using modular addition, subtraction, and multiplica...
详细信息
Text-Image Person Re-Identification (TIReID) is a computer vision task that involves identifying person in images or videos based on textual descriptions. Current works mainly employ Vision Language Pretrained (VLP) m...
详细信息
Nowadays, modern computer systems rely heavily on parallelprocessing, and not only because of the multicore CPUs bundled with any machine, even mobile devices, but more and more thanks to the parallelprocessing capa...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
Nowadays, modern computer systems rely heavily on parallelprocessing, and not only because of the multicore CPUs bundled with any machine, even mobile devices, but more and more thanks to the parallelprocessing capacities of graphics processing units (GPU), general-purpose computing on graphics processing units (GPGPU) being one example. In this paper, relying on the DirectX 12 framework, we propose an innovative approach to enable parallelprocessing for graphical rendering on boththe CPU and GPU for the popular Racket functional programming language (formerly PLT Scheme), and importantly without compromising Racket's usability and programmer-friendliness. Our performance evaluations show significant improvements with respect to execution time (x3 speed-up in some cases), CPU utilisation time (reduced by as much as 80% in some scenarios) and the frame rate when using moving graphics.
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. parallelization strategies are applie...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. these algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show that sequential optimizations yield up to 8x speedup for scenarios with larger data.
the application of the Monte Carlo method is used in the processing of the measurement result of CCM.M-K1. this method can get over the limitations that apply in certain cases to the method described in GUM. Introduct...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
the application of the Monte Carlo method is used in the processing of the measurement result of CCM.M-K1. this method can get over the limitations that apply in certain cases to the method described in GUM. Introduction and analysis of CCM.M-K1 measurement result was given out and commercial software named @RISK was used to purse numerical simulation and the result was compared withthe final report of CCM.M-K1, which showed that differences between results of these two were negligible.
Matrix multiplication is an example of application that is both easy to specify and to provide a simple implementation. there exist numerous sophisticated algorithms or very efficient complex implementations. In this ...
详细信息
Task-based programming provides programmers with an intuitive abstraction to express parallelism, and runtimes withthe flexibility to adapt the schedule and load-balancing to the hardware. Although many profiling too...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
Task-based programming provides programmers with an intuitive abstraction to express parallelism, and runtimes withthe flexibility to adapt the schedule and load-balancing to the hardware. Although many profiling tools have been developed to understand these characteristics, the interplay between task scheduling and data reuse in the cache hierarchy has not been explored. these interactions are particularly intriguing due to the flexibility task-based runtimes have in scheduling tasks, which may allow them to improve cache behavior. this work presents StatTask, a novel statistical cache model that can predict cache behavior for arbitrary task schedules and cache sizes from a single execution, without programmer annotations. StatTask enables fast and accurate modeling of data locality in task-based applications for the first time. We demonstrate the potential of this new analysis to scheduling by examining applications from the BOTS benchmarks suite, and identifying several important opportunities for reuse-aware scheduling.
the main contribution of this paper is to show optimal algorithms computing the sum and the prefix-sums on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). the DMM and...
详细信息
暂无评论