检索结果-内蒙古大学图书馆

parallelized RDOQ algorithm and Fully Pipelined Hardware Architecture for AVS3 Video Coding

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2024年第7期34卷 6430-6444页

作者： Huang, Xiaofeng Tang, Ran Pan, Rui Yin, Haibing Wang, Zhao Wang, Shiqi Ma, Siwei Hangzhou Dianzi Univ Sch Commun Engn Hangzhou 310018 Peoples R China Peking Univ Adv Inst Informat Technol Hangzhou 311215 Peoples R China Hangzhou Dianzi Univ Sch Elect & Informat Hangzhou 310018 Peoples R China Peking Univ Sch Comp Sci Beijing 100871 Peoples R China City Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China

The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency in RDOQ impede the hardware implementation. To address these issues, we propose a zig-zag scanline-level parallelized RDOQ algorithm and its fully pipelined hardware architecture for AVS3 video coding. For algorithm optimization, we update the run-level context for rate estimation in the inner zig-zag scanline and propose an efficient RD cost calculation form in the optimal coefficient level (OCL) decision step. In the last significant coefficient (LSC) position decision step, a greedy strategy based algorithm is proposed to optimize the determination process in parallel. Moreover, the proposed parallelized RDOQ algorithm is accelerated by single instruction multiple data (SIMD) on the Intel X86 platform. For hardware architecture design, a fully pipelined hardware architecture is proposed with nine pipeline stages. This design can process multiple transform units in parallel when the height is less than 32. Experimental results show that the proposed algorithm achieves 31.37%, 28.58%, and 28.53% time-saving by 0.25%, 0.26%, and 0.27% Bj & oslash;ntegaard delta rate (BD-Rate) increase on average under all intra (AI), random access (RA), and low delay B (LDB) configurations, respectively. The hardware implementation achieves 32 coefficients per cycle, and the area consumption is 1223.2-K logic gates when working at 471.2-MHz. It is proven that the proposed algorithm and hardware architecture design achieve a good trade-off between coding efficiency and hardware throughput.

关键词： Hardware Costs Computer architecture Quantization (signal) Video coding Transforms Estimation RDOQ AVS3 zig-zag scanline parallelized algorithm hardware architecture

来源：评论

学校读者我要写书评

暂无评论

引用

13th International Conference on Parallel and Distributed Computing, Applications, and Technologies (PDCAT)

作者： Zhang, Jian Yuan, Chunfeng Huang, Yihua Nanjing Univ Natl Key Lab Novel Software Technol Dept Comp Sci & Technol Nanjing 210093 Jiangsu Peoples R China

ISBN: (纸本)9780769548791

Measures of graph similarity have a broad range of applications but involve compute-intensive process. Similarity flooding algorithm is an efficient algorithm for comparing the similarity of graphs of small size and small datasets. However, nowadays more and more large-scale graph applications emerge and existing stand-alone similarity flooding algorithm cannot efficiently conduct the similarity comparison process for large scale graph datasets in acceptable time. This paper presents a parallelized similarity flooding algorithm with MapReduce for large-scale graph datasets. The experimental results demonstrate that the parallelized algorithm achieves significant performance improvement compared to the stand-alone similarity flooding algorithm. Experimental results also reveal that the parallelized algorithm can obtain excellent speedup when the size of cluster increases.

关键词： similarity flooding algorithm large-scale graph data parallelized algorithm MapReduce

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for solving the 3d Schrodinger equation

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2010年第17期229卷 6015-6026页

作者： Strickland, Michael Yager-Elorriaga, David Gettysburg Coll Dept Phys Gettysburg PA 17325 USA

We describe a parallel algorithm for solving the time-independent 3d Schrodinger equation using the finite difference time domain (FDTD) method. We introduce an optimized parallelization scheme that reduces communication overhead between computational nodes We demonstrate that the compute time, t, scales inversely with the number of computational nodes as t proportional to (N-nodes)(-0 95 +/- 0 04) This makes it possible to solve the 3d Schrodinger equation on extremely large spatial lattices using a small computing cluster. In addition, we present a new method for precisely determining the energy eigenvalues and wavefunctions of quantum states based on a symmetry constraint on the FDTD initial condition Finally, we discuss the usage of multi-resolution techniques in order to speed up convergence on extremely large lattices. (C) 2010 Elsevier Inc. All rights reserved

关键词： Quantum mechanics Schrodinger equations parallelized algorithm Finite difference time domain

来源：评论

学校读者我要写书评

暂无评论

Two game-based solution concepts for a two-agent scheduling problem

引用

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2016年第2期19卷 769-781页

作者： Zhao, Bing Gu, Yanhong Ruan, Yuan Chen, Quanle Shenzhen Univ Intelligent Computat Sci Inst Coll Math & Stat Shenzhen 518060 Peoples R China Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen 518060 Peoples R China Natl Univ Singapore Logist Inst Asia Pacific Singapore 117574 Singapore

In the current research papers on multi-agent (multi-person) scheduling, a person's objective function is always considered as a cost function on scheduling, whereas a cooperative profit function is defined to serve as his objective one in this paper. In the two-person scheduling problem addressed in this paper, the two persons jointly order a common operational time interval of a single machine. Each person needs to process a set of his own jobs in that time window. The same objective function of each person still relies on the sequence of all the jobs of both persons since each part of the function is determined by some given parameters except one part assumed to be a given multiple of the total completion time of his own jobs. The two persons have to negotiate a job sequence and determine the (related) final solution on cooperative profit allocation. Such a two-person scheduling problem is essentially a cooperative game. An algorithm is designed to yield the cooperative-profit-based Pareto efficient solution set acting as the first game-based solution concept in this paper. The parallelized version of the algorithm is also developed. The second game-based solution concept is the Shapley value appropriate for the above cooperative-game situation on two-person scheduling. Several instances are presented and analyzed to reveal the necessity to employ the two solution concepts together.

关键词： Two-agent scheduling Cooperative-game Cooperative profit function Pareto efficient (PE) solution set Shapley value parallelized algorithm

来源：评论

学校读者我要写书评

暂无评论

Large-Scale Neo-Heterogeneous Programming and Optimization of SNP Detection on Tianhe-2 1

引用

30th International Supercomputing Conference on High Performance Computing (ISC High Performance)

作者： Cui, Yingbo Liao, Xiangke Peng, Shaoliang Lu, Yutong Yang, Canqun Wang, Bingqiang Wu, Chengkun Natl Univ Def Technol Sch Comp Sci Changsha Hunan Peoples R China Natl Supercomp Ctr Shenzhen Shenzhen Peoples R China

ISBN: (数字)9783319201191

ISBN: (纸本)9783319201191;9783319201184

SNP detection is a fundamental procedure in genome analysis. A popular SNP detection tool SOAPsnp can take more than one week to analyze one human genome with a 20-fold coverage. To improve the efficiency, we developed mSNP, a parallel version of SOAPsnp. mSNP utilizes CPU cooperated with Intel (R) Xeon Phi (TM) for large-scale SNP detection. Firstly, we redesigned the key data structure of SOAPsnp, which significantly reduces the overhead of memory operations. Secondly, we devised a coordinated parallel framework, in which CPU collaborates with Xeon Phi for higher hardware utilization. Thirdly, we proposed a read-based window division strategy to improve throughput and parallel scale on multiple nodes. To the best of our knowledge, mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 45x speedup on a single node of Tianhe-2, without any loss in precision. Moreover, mSNP showed promising scalability on 4,096 nodes on Tianhe-2.

关键词： SNP detection SOAPsnp parallelized algorithm Xeon Phi Many Integrated Core (MIC) Coprocessor Tianhe-2

来源：评论

学校读者我要写书评

暂无评论

LazyFox: fast and parallelized overlapping community detection in large graphs

引用

PEERJ COMPUTER SCIENCE 2023年 9卷 e1291页

作者： Garrels, Tim Khodabakhsh, Athar Renard, Bernhard Y. Baum, Katharina Hasso Plattner Inst Digital Engn gGmbH Potsdam Germany Univ Potsdam Digital Engn Fac Potsdam Germany Free Univ Berlin Dept Math & Comp Sci Berlin Germany Icahn Sch Med Mt Sinai Windreich Dept Artificial Intelligence & Human He New York NY USA Icahn Sch Med Mt Sinai Hasso Plattner Inst Digital Hlth Mt Sinai New York NY USA

The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, FOX, that detects such overlapping communities. FOX measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LAZYFOX, a multi-threaded adaptation of the FOX algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LAZYFOX enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LAZYFOX's implementation was published and is available as a tool under an MIT licence at https://***/TimGarrels/LazyFox.

关键词： Overlapping community detection Large networks Weighted clustering coefficient Heuristic triangle estimation parallelized algorithm C++ tool Runtime improvement Open source Graph algorithm Community analysis

来源：评论

学校读者我要写书评

暂无评论

P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool

引用

BMC GENOMICS 2017年第2期18卷 134-134页

作者： Peng, Shaoliang Yang, Shunyun Gao, Ming Liao, Xiangke Liu, Jie Yang, Canqun Wu, Chengkun Yu, Wenqiang Natl Univ Def Technol Sch Comp Sci Changsha Hunan Peoples R China Univ Manchester Fac Life Sci Manchester Lancs England Fudan Univ Inst Biomed Sci EpiRNA Lab Shanghai Peoples R China

Background: The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. Results: In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. Conclusions: To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).

关键词： DNA methylation detection Whole genome parallelized algorithm Xeon Phi Tianhe-2

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：