The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency i...
详细信息
The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency in RDOQ impede the hardware implementation. To address these issues, we propose a zig-zag scanline-level parallelized RDOQ algorithm and its fully pipelined hardware architecture for AVS3 video coding. For algorithm optimization, we update the run-level context for rate estimation in the inner zig-zag scanline and propose an efficient RD cost calculation form in the optimal coefficient level (OCL) decision step. In the last significant coefficient (LSC) position decision step, a greedy strategy based algorithm is proposed to optimize the determination process in parallel. Moreover, the proposed parallelized RDOQ algorithm is accelerated by single instruction multiple data (SIMD) on the Intel X86 platform. For hardware architecture design, a fully pipelined hardware architecture is proposed with nine pipeline stages. This design can process multiple transform units in parallel when the height is less than 32. Experimental results show that the proposed algorithm achieves 31.37%, 28.58%, and 28.53% time-saving by 0.25%, 0.26%, and 0.27% Bj & oslash;ntegaard delta rate (BD-Rate) increase on average under all intra (AI), random access (RA), and low delay B (LDB) configurations, respectively. The hardware implementation achieves 32 coefficients per cycle, and the area consumption is 1223.2-K logic gates when working at 471.2-MHz. It is proven that the proposed algorithm and hardware architecture design achieve a good trade-off between coding efficiency and hardware throughput.
Measures of graph similarity have a broad range of applications but involve compute-intensive process. Similarity flooding algorithm is an efficient algorithm for comparing the similarity of graphs of small size and s...
详细信息
ISBN:
(纸本)9780769548791
Measures of graph similarity have a broad range of applications but involve compute-intensive process. Similarity flooding algorithm is an efficient algorithm for comparing the similarity of graphs of small size and small datasets. However, nowadays more and more large-scale graph applications emerge and existing stand-alone similarity flooding algorithm cannot efficiently conduct the similarity comparison process for large scale graph datasets in acceptable time. This paper presents a parallelized similarity flooding algorithm with MapReduce for large-scale graph datasets. The experimental results demonstrate that the parallelized algorithm achieves significant performance improvement compared to the stand-alone similarity flooding algorithm. Experimental results also reveal that the parallelized algorithm can obtain excellent speedup when the size of cluster increases.
We describe a parallel algorithm for solving the time-independent 3d Schrodinger equation using the finite difference time domain (FDTD) method. We introduce an optimized parallelization scheme that reduces communicat...
详细信息
We describe a parallel algorithm for solving the time-independent 3d Schrodinger equation using the finite difference time domain (FDTD) method. We introduce an optimized parallelization scheme that reduces communication overhead between computational nodes We demonstrate that the compute time, t, scales inversely with the number of computational nodes as t proportional to (N-nodes)(-0 95 +/- 0 04) This makes it possible to solve the 3d Schrodinger equation on extremely large spatial lattices using a small computing cluster. In addition, we present a new method for precisely determining the energy eigenvalues and wavefunctions of quantum states based on a symmetry constraint on the FDTD initial condition Finally, we discuss the usage of multi-resolution techniques in order to speed up convergence on extremely large lattices. (C) 2010 Elsevier Inc. All rights reserved
In the current research papers on multi-agent (multi-person) scheduling, a person's objective function is always considered as a cost function on scheduling, whereas a cooperative profit function is defined to ser...
详细信息
In the current research papers on multi-agent (multi-person) scheduling, a person's objective function is always considered as a cost function on scheduling, whereas a cooperative profit function is defined to serve as his objective one in this paper. In the two-person scheduling problem addressed in this paper, the two persons jointly order a common operational time interval of a single machine. Each person needs to process a set of his own jobs in that time window. The same objective function of each person still relies on the sequence of all the jobs of both persons since each part of the function is determined by some given parameters except one part assumed to be a given multiple of the total completion time of his own jobs. The two persons have to negotiate a job sequence and determine the (related) final solution on cooperative profit allocation. Such a two-person scheduling problem is essentially a cooperative game. An algorithm is designed to yield the cooperative-profit-based Pareto efficient solution set acting as the first game-based solution concept in this paper. The parallelized version of the algorithm is also developed. The second game-based solution concept is the Shapley value appropriate for the above cooperative-game situation on two-person scheduling. Several instances are presented and analyzed to reveal the necessity to employ the two solution concepts together.
SNP detection is a fundamental procedure in genome analysis. A popular SNP detection tool SOAPsnp can take more than one week to analyze one human genome with a 20-fold coverage. To improve the efficiency, we develope...
详细信息
ISBN:
(数字)9783319201191
ISBN:
(纸本)9783319201191;9783319201184
SNP detection is a fundamental procedure in genome analysis. A popular SNP detection tool SOAPsnp can take more than one week to analyze one human genome with a 20-fold coverage. To improve the efficiency, we developed mSNP, a parallel version of SOAPsnp. mSNP utilizes CPU cooperated with Intel (R) Xeon Phi (TM) for large-scale SNP detection. Firstly, we redesigned the key data structure of SOAPsnp, which significantly reduces the overhead of memory operations. Secondly, we devised a coordinated parallel framework, in which CPU collaborates with Xeon Phi for higher hardware utilization. Thirdly, we proposed a read-based window division strategy to improve throughput and parallel scale on multiple nodes. To the best of our knowledge, mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 45x speedup on a single node of Tianhe-2, without any loss in precision. Moreover, mSNP showed promising scalability on 4,096 nodes on Tianhe-2.
The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug disc...
详细信息
The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, FOX, that detects such overlapping communities. FOX measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LAZYFOX, a multi-threaded adaptation of the FOX algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LAZYFOX enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LAZYFOX's implementation was published and is available as a tool under an MIT licence at https://***/TimGarrels/LazyFox.
Background: The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation...
详细信息
Background: The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. Results: In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. Conclusions: To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).
暂无评论