检索结果-内蒙古大学图书馆

5th international conference on algorithms and architectures for parallel processing, ICA3PP 2002

作者： Zhang, Fa Qiao, Xiang-Zhen Liu, Zhi-Yong Institute of Computing Technology Chinese Academy of Sciences Beijing100080 China National Natural Science Foundation of China Beijing100083 China

ISBN: (纸本)0769515126

Biological sequence comparison is an important tool for researchers in molecular biology. there are several algorithms for sequence comparison. the Smith-Waterman algorithm, based on dynamic programming, is one of the most fundamental algorithms in bioinformatics. However, the existing parallel Smith-Waterman algorithm needs large memory space. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith-Waterman algorithm has becoming a critical problem. For resolving this problem, we develop a new parallel Smith-Waterman algorithm using the method of divide and conquer, named PSW-DC. Memory space required in the new parallel algorithm is reduced significantly in comparison with existing ones. A key technique, named the C&E method, is developed for implementation of the new parallel Smith-Waterman algorithm. © 2002 IEEE.

关键词： Dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A GENERALIZED SELF-SCATTERING TECHNIQUE FOR MONTE-CARLO SIMULATION SUITABLE FOR SIMD architectures

引用

COMPEL-thE international JOURNAL FOR COMPUTATION AND MAthEMATICS IN ELECTRICAL AND ELECTRONIC ENGINEERING 1994年第4期13卷 661-669页

作者： SHENG, H GUERRIERI, R SANGIOVANNIVINCENTELLI, AS UNIV BOLOGNA DIPARTIMENTO ELETTRON & INFORMATBOLOGNAITALY

We present a generalized self-scattering method for generating carrier free flight times in Monte Carlo simulation. Compared to traditional approaches, the added flexibility of this approach results in fewer fictitious scatterings, which is especially appealing for load balance and efficiency when a SIMD parallel computer is used. Speedups from 19% to 69% over an optimized variable-Gamma approach are shown for an implementation on the Connection Machine CM-2. the performance sensitivities to applied fields and grid spacings are also presented. the conversion of existing variable-Gamma software to this new approach requires only a few changes.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An Algorithmic Approach to Communication Reduction in parallel Graph algorithms 24

An Algorithmic Approach to Communication Reduction in Parall...

引用

24th international conference on parallel Architecture and Compilation (PACT)

作者： Harshvardhan Fidel, Adam Amato, Nancy M. Rauchwerger, Lawrence Texas A&M Univ Dept Comp Sci & Engn Parasol Lab College Stn TX 77843 USA

ISBN: (纸本)9781467395243

Graph algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. this work presents an approach to transparently (without programmer intervention) allow fine-grained graph algorithms to utilize algorithmic communication reduction optimizations. In many graph algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithmaware coarsening of messages sent during vertex visitation, reducing both the number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained graph algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131, 000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various graph mining and graph analytics algorithms.

关键词： parallel graph processing graph analytics big data

来源：评论

学校读者我要写书评

暂无评论

Modified Efficient parallel Distributed Arithmetic based FIR Filter Architecture for ASIC and FPGA 10

Modified Efficient Parallel Distributed Arithmetic based FIR...

引用

10th international conference on Signal processing and Integrated Networks, SPIN 2023

作者： Soni, Teena Kumar, Anil Panda, Manoj Kumar PDPM-Indian Institute of Information Technology Design and Manufacturing Department of Electronics and Communication Jabalpur India PDPM-Indian Institute of Information Technology Design and Manufacturing Department of Natural Science Jabalpur India

ISBN: (纸本)9781665490993

Digital FIR filters can be efficiently implemented using distributed arithmetic (DA). Original DA provides low throughput. parallel DA is proven to be a promising technique for efficient DA implementation. Block-based parallel DA architecture proposed by Singhal and Mohanty is examined and improved by applying a modified LUT decomposition scheme. Experiments with different levels of LUT decomposition are performed with FIR filters of orders 16 and 32. the proposed architectures are implemented in Basys-3 (Artix-7, XC7A35T-1CPG236C) FPGA board. Several critical performance metrics such as the number of slices, maximum clock frequency, dynamic power consumption, and throughput are estimated for different filter orders for the targeted FPGA Board. the proposed architecture is also implemented for ASIC using a 45 nm NanGate open cell library and area, power, and delay are reported. Comparison with state-of-the-art DA architectures for FPGA implementation provides an average of 64% reduction in area and 22% improvement in throughput. © 2023 IEEE.

关键词： FIR filters

来源：评论

学校读者我要写书评

暂无评论

Topic 10: parallel Numerical algorithms (Introduction)

Topic 10: Parallel Numerical Algorithms (Introduction)

引用

19th international conference on Euro-Par

作者： Langou, Julien Bolten, Matthias Grigori, Laura Vajtersic, Marian University of Colorado Denver CO United States University of Wuppertal Germany INRIA France University of Salzburg Austria

ISBN: (纸本)9783642400476

the solution of large-scale problems in Computational Science and Engineering relies on the availability of accurate, robust and efficient numerical algorithms and software that are able to exploit the power offered by modern computer architectures. Such algorithms and software provide building blocks for prototyping and developing novel applications, and for improving existing ones, by relieving the developers from details concerning numerical methods as well as their implementation in new computing environments.

关键词： Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Optimizing parallel multiplication operation for rectangular and transposed matrices

Optimizing parallel multiplication operation for rectangular...

引用

10th international conference on parallel and Distributed Systems (ICPADS 2004)

作者： Krishnan, M Nieplocha, J Pacific NW Natl Lab Richland WA 99352 USA

ISBN: (纸本)0769521525

In many applications, matrix multiplication involves different shapes of matrices. the shape of the matrix can significantly impact the performance of matrix multiplication algorithm. this paper describes extensions of the SRUMMA parallel matrix multiplication algorithm [1] to improve performance of transpose and rectangular matrices. Our approach relies on a set of hybrid algorithms which are chosen based on the shape of matrices and transpose operator involved. the algorithm exploits performance characteristics of clusters and shared memory systems: it differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. the experimental results on clusters and shared memory systems demonstrate consistent performance advantages over pdgemm from the ScaLAPACK parallel linear algebra package.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

the improved CGS method for large and sparse linear systems on bulk synchronous parallel architectures 5

The improved CGS method for large and sparse linear systems ...

引用

5th international conference on algorithms and architectures for parallel processing

作者： Yang, LTR St Francis Xavier Univ Dept Comp Sci Antigonish NS B2G 2W5 Canada

ISBN: (纸本)0769515126

We propose an improved version of the CGS method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. the proposed method combines elements of numerical stability and parallel algorithm design without increasing computational costs. the algorithm is derived such that all matrix-vector multiplication, inner products and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. therefore, the cost of global communication which represents the bottleneck of the performance can be significantly reduced. In this paper, the Bulk Synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. this performance model uses only a few system dependent parameters based on a simple and accurate cost modelling to provide useful insight in the time complexity of the method. the theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.

关键词： Numerical methods

来源：评论

学校读者我要写书评

暂无评论

the execube parallel processor chip and cellular automata for tactical route planning 10th

The execube parallel processor chip and cellular automata fo...

引用

10th Computing in Aerospace conference, 1995

作者： Bezek, J. David Stiles, Peter Loral Federal Systems – Owego IBM Federal Systems Company 1801 State Route 17C OwegoNY13827-3998 United States

parallel processing and the methods to program, coordinate, and operate such computing systems and architectures have become a vast and highly pursued research area. these computing environments have quickly become a vital means to attack difficult compute-intensive algorithmic and heuristic problems. We present a high level description of the Execube processor. this parallel processor chip comprises eight computing engines, local memories, and a high speed message passing system on a single chip type. Chip extensibility and interconnection is a simple building block approach. A class of computational models known as cellular automata (CA) is one of several that can effectively exploit Execube’s parallelism. We discuss CA algorithms in general and present sample applications applicable to the Execube parallel processor. © 1995, American Institute of Aeronautics and Astronautics Inc, AIAA. All rights reserved.

关键词： Cellular automata

来源：评论

学校读者我要写书评

暂无评论

theory and algorithms for parallel computation 10th

引用

10th international European conference on parallel processing, Euro-Par 2004

作者： Christos, Kaklamanis Amato, Nancy Krizanc, Danny Pietracaprina, Andrea

来源：评论

学校读者我要写书评

暂无评论

parallel Implementation of the Genetic Algorithm on NVIDIA GPU Architecture for Synthesis and Inversion

Parallel Implementation of the Genetic Algorithm on NVIDIA G...

引用

10th international conference on Barkhausen and Micro-Magnetics (ICBM)

作者： Karthik, Victor U. Sivasuthan, Sivamayam Hoole, Samuel Ratnajeevan H. Michigan State Univ Dept Elect & Comp Engn E Lansing MI 48824 USA

ISBN: (纸本)9780735412125

the computational algorithms for device synthesis and nondestructive evaluation (NDE) are often the same. In both we have a goal a particular field configuration yielding the design performance in synthesis or to match exterior measurements in NDE. the geometry of the design or the postulated interior defect is then computed. Several optimization methods are available for this. the most efficient like conjugate gradients are very complex to program for the required derivative information the least efficient zeroth order algorithms like the genetic algorithm take much computational time but little programming effort. this paper reports launching a Genetic Algorithm kernel on thousands of compute unified device architecture (CUDA) threads exploiting the NVIDIA graphics processing unit (GPU) architecture. the efficiency of parallelization, although below that on shared memory supercomputer architectures, is quite effective in cutting down solution time into the realm of the practicable. We carry this further into multi-physics electro-heat problems where the parameters of description are in the electrical problem and the object function in the thermal problem. Indeed, this is where the derivative of the object function in the heat problem with respect to the parameters in the electrical problem is the most difficult to compute for gradient methods, and where the genetic algorithm is most easily implemented.

关键词： NDE GPU Multi-physics FEM Optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：