In this paper, we propose a parallel implementation of the number-theoretic transform (NTT) on GPU clusters. the butterfly operation of the NTT can be performed using modular addition, subtraction, and multiplica...
详细信息
Text-Image Person Re-Identification (TIReID) is a computer vision task that involves identifying person in images or videos based on textual descriptions. Current works mainly employ Vision Language Pretrained (VLP) m...
详细信息
Visibility computing is a basic problem in computer graphics, and is often the bottleneck in realistic rendering algorithms. Some of the most common include the determination of the objects visible from a viewpoint, v...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
Visibility computing is a basic problem in computer graphics, and is often the bottleneck in realistic rendering algorithms. Some of the most common include the determination of the objects visible from a viewpoint, virtual reality, real-time simulation and 3D interactive design. As one technique to accelerate the rendering speed, the research on visibility computing has gained great attention in recent years. Traditional visibility computing on single processor machine has been unable to meet more and more large-scale and complex scenes due to lack parallelism. However, it will face many challenges to design parallelalgorithms on a cluster due to imbalance workload among compute nodes, the complicated mathematical model and different domain knowledge. In this paper, we propose an efficient and highly scalable framework for visibility computing on Tianhe-2 supercomputer. Firstly, a new technique called hemispheric visibility computing is designed, which can overcome the visibility missing of traditional perspective algorithm. Secondly, a distributed parallel algorithm for visibility computing is implemented, which is based on the master-worker architecture. Finally, we discuss the issue of granularity of visibility computing and some optimization strategies for improving overall performance. Experiments on Tianhe-2 supercomputer show that our distributed parallel visibility computing framework almost reaches linear speedup by using up to 7680 CPU cores.
Recent decades have seen the rapid development of cloud computing, resulting in a huge breakthrough for people to handle the data produced every second and everywhere. Meanwhile, data compression is becoming increasin...
详细信息
ISBN:
(纸本)9783030050573;9783030050566
Recent decades have seen the rapid development of cloud computing, resulting in a huge breakthrough for people to handle the data produced every second and everywhere. Meanwhile, data compression is becoming increasingly important, due to its great potential in benefiting boththe network transportation and the storage. Based on the urgent demand in high-efficient compression method with balanced performance in both merits of compression time and ratio, this paper presents PLZMA, a parallel design of LZMA. Process-level and thread-level parallelisms are implemented according to the algorithm of LZMA, which have gained great improvement in compression time, while ensuring a fair compression ratio. Experimental results on real-world application showed that PLZMA is able to achieve more balanced performance over other famous methods. the parallel design is able to achieve a performance speedup of 8x over the serial baseline, using 12threads.
Nowadays, modern computer systems rely heavily on parallelprocessing, and not only because of the multicore CPUs bundled with any machine, even mobile devices, but more and more thanks to the parallelprocessing capa...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
Nowadays, modern computer systems rely heavily on parallelprocessing, and not only because of the multicore CPUs bundled with any machine, even mobile devices, but more and more thanks to the parallelprocessing capacities of graphics processing units (GPU), general-purpose computing on graphics processing units (GPGPU) being one example. In this paper, relying on the DirectX 12 framework, we propose an innovative approach to enable parallelprocessing for graphical rendering on boththe CPU and GPU for the popular Racket functional programming language (formerly PLT Scheme), and importantly without compromising Racket's usability and programmer-friendliness. Our performance evaluations show significant improvements with respect to execution time (x3 speed-up in some cases), CPU utilisation time (reduced by as much as 80% in some scenarios) and the frame rate when using moving graphics.
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. parallelization strategies are applie...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. these algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show that sequential optimizations yield up to 8x speedup for scenarios with larger data.
this article presents massively parallel execution of the BLAST algorithm on supercomputers and HPC clusters using thousands of processors. Our work is based on the optimal splitting up the set of queries running with...
详细信息
ISBN:
(纸本)9783319654829;9783319654812
this article presents massively parallel execution of the BLAST algorithm on supercomputers and HPC clusters using thousands of processors. Our work is based on the optimal splitting up the set of queries running withthe non-modified NCBI-BLAST package for sequence alignment. the work distribution and search management have been implemented in Java using a PCJ (parallel Computing in Java) library. the PCJ-BLAST package is responsible for reading sequence for comparison, splitting it up and start multiple NCBI-BLAST executables. We also investigated a problem of parallel I/O and thanks to PCJ library we deliver high throughput execution of BLAST. the presented results show that using Java and PCJ library we achieved very good performance and efficiency. In result, we have significantly reduced time required for sequence analysis. We have also proved that PCJ library can be used as an efficient tool for fast development of the scalable applications.
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal ...
ISBN:
(纸本)9783642330643
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal attacks in distributed wireless sensor networks;optimization of a short-range proximity effect correction algorithm in e-beam lithography using GPGPUs;vectorized algorithms for Quadtree construction and descent;an optimal parallel prefix-sums algorithm on the memory machine models for GPUs;enhancing the performance of a distributed mobile computing environment by topology construction;maintaining consistency in software transactional memory through dynamic versioning tuning;a new low latency parallel turbo decoder employing parallel phase decoding method;high-performance matrix multiply on a massively multithreaded Fiteng1000 processor;and on construction of Cloud IaaS for VM live migration using KVM and OpenNebula.
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal ...
ISBN:
(纸本)9783642330773
the proceedings contain 73 papers. the topics discussed include: accelerating the dynamic programming for the optimal polygon triangulation on the GPU;security computing for the resiliency of protecting from internal attacks in distributed wireless sensor networks;optimization of a short-range proximity effect correction algorithm in e-beam lithography using GPGPUs;vectorized algorithms for Quadtree construction and descent;an optimal parallel prefix-sums algorithm on the memory machine models for GPUs;enhancing the performance of a distributed mobile computing environment by topology construction;maintaining consistency in software transactional memory through dynamic versioning tuning;a new low latency parallel turbo decoder employing parallel phase decoding method;high-performance matrix multiply on a massively multithreaded Fiteng1000 processor;and on construction of Cloud IaaS for VM live migration using KVM and OpenNebula.
the application of the Monte Carlo method is used in the processing of the measurement result of CCM.M-K1. this method can get over the limitations that apply in certain cases to the method described in GUM. Introduct...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
the application of the Monte Carlo method is used in the processing of the measurement result of CCM.M-K1. this method can get over the limitations that apply in certain cases to the method described in GUM. Introduction and analysis of CCM.M-K1 measurement result was given out and commercial software named @RISK was used to purse numerical simulation and the result was compared withthe final report of CCM.M-K1, which showed that differences between results of these two were negligible.
暂无评论