检索结果-内蒙古大学图书馆

AN OPTIMAL SYSTOLIC ARRAY FOR THE ALGEBRAIC PATH PROBLEM

IEEE TRANSACTIONS ON COMPUTERS 1991年第1期40卷 100-105页

作者： LEWIS, PS KUNG, SY PRINCETON UNIV DEPT ELECT ENGNPRINCETONNJ 08544

A new systolic array design for the Algebraic Path Problem (APP) is presented that is both simpler and more efficient than previously proposed configurations. This array uses N2 orthogonally connected processing elements and requires 2N I/O connections. Total computation time is 5N - 2, which is the minimum time possible in a systolic implementation. The data pipelining rate is one, so no pipeline interleave is required. For multiple problem instances a block pipeline rate of N can be achieved, which is optimal for an array of N2 processing elements.

关键词： ALGEBRAIC PATH PROBLEM algorithm mapping MATRIX INVERSION PARALLEL PROCESSING SHORTEST PATH PROBLEM SYSTOLIC ARRAYS TRANSITIVE CLOSURE PROBLEM VLSI ARCHITECTURES

来源：评论

学校读者我要写书评

暂无评论

RECONFIGURABLE SIMD MASSIVELY PARALLEL COMPUTERS

引用

PROCEEDINGS OF THE IEEE 1991年第4期79卷 429-443页

作者： LI, HW STOUT, QF UNIV MICHIGAN DEPT ELECT ENGNADV COMP ARCHITECTURE LABANN ARBORMI 48109 UNIV MICHIGAN DEPT ELECT ENGNSCI COMP LABANN ARBORMI 48109

Reconfigurable SIMD parallel processor is a member of SIMD architectures. Its most distinguished feature is the utilization of the reconfigurability of the interconnection network to 1) establish a network topology well mapped to the algorithm communication graph so that higher efficiency can be achieved, and to 2) remove faulty processors from the network so that the system operation can be kept uninterrupted while maintaining the same or slightly degraded efficiency. This paper describes several existing reconfigurable SIMD parallel architectures and their reconfiguration mechanism, demonstrates the effectiveness of algorithm mapping through reconfiguration, and discusses fault tolerant schemes via reconfiguration.

关键词： algorithm communication graph algorithm mapping fault tolerant computing fault-tolerant schemes faulty processors interconnection network multiprocessor interconnection networks network topology parallel architectures parallel architectures reconfigurable SIMD massively parallel computers system operation

来源：评论

学校读者我要写书评

暂无评论

An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 2016年第5期E99D卷 1285-1295页

作者： Liu, Leibo Wang, Dong Chen, Yingjie Zhu, Min Yin, Shouyi Wei, Shaojun Tsinghua Univ Inst Microelect Beijing 100084 Peoples R China Beijing Jiaotong Univ Inst Informat Sci Beijing Peoples R China

This paper presents the design of a multiple-standard 1080 high definition (HD) video decoder on a mixed-grained reconfigurable computing platform integrating coarse-grained reconfigurable processing units (RPUs) and FPGAs. The proposed RPU, including 16 x 16 multi-functional processing elements (PEs), is used to accelerate compute-intensive tasks in the video decoding. A soft-core-based microprocessor array is implemented on the FPGA and adopted to speed-up the dynamic reconfiguration of the RPU. Furthermore, a mail-box-based communication scheme is utilized to improve the communication efficiency between RPUs and FPGAs. By exploiting dynamic reconfiguration of the RPUs and static reconfiguration of the FPGAs, the proposed platform achieves scalable performances and cost trade-offs to support a variety of video coding standards, including MPEG-2, AVS, H.264, and HEVC. The measured results show that the proposed platform can support H.264 1080 HD video streams at up to 57 frames per second (fps) and HEVC 1080 HD video streams at up to 52fps under 250MHz, at the same time, it achieves a 3.6x performance gain over an industrial coarse-grained reconfigurable processor for H. 264 decoding, and a 6.43x performance boosts over a general purpose processor based implementation for HEVC decoding.

关键词： algorithm mapping coarse-grained reconfigurable array field-programmable gate array reconfigurable computing video decoding

来源：评论

学校读者我要写书评

暂无评论

SOME NEW DESIGNS OF 2-D ARRAY FOR MATRIX MULTIPLICATION AND TRANSITIVE CLOSURE

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1995年第4期6卷 351-362页

作者： TSAY, JC CHANG, PY Institute of Computer Science and Information Engineering College of Engineering National Chiao Tung University Hsinchu Taiwan

In this paper, we present some new regular iterative algorithms for matrix multiplication and transitive closure. With these algorithms, by spacetime mapping the 2-D arrays with 2N-1 and [(3N-1)/2] execution times for matrix multiplication can be obtained, Meanwhile, we can derive a 2-D array with 4N-2 execution time for transitive closure based on the sequential Warshall-Floyd algorithm. All these new 2-D arrays for matrix multiplication and transitive closure have the advantages of faster and more regular than other previous designs.

关键词： algorithm mapping MATRIX MULTIPLICATION MESH ARRAY SYSTOLIC ARRAY SPHERICAL ARRAY TRANSITIVE CLOSURE VLSI

来源：评论

学校读者我要写书评

暂无评论

TIME OPTIMAL LINEAR SCHEDULES FOR algorithmS WITH UNIFORM DEPENDENCIES

引用

IEEE TRANSACTIONS ON COMPUTERS 1991年第6期40卷 723-742页

作者： SHANG, WJ FORTES, JAB PURDUE UNIV SCH ELECT ENGNW LAFAYETTEIN 47907

An algorithm can be thought of as a set of indexed computations and if one computation uses data generated by another computation then this data dependence can be represented by the difference of their indexes (called dependence vector). Many important algorithms are characterized by the fact that data dependencies are uniform, i.e., the values of the dependence vectors are independent of the indexes of computations. Linear schedules are a special class of schedules described by a linear mapping of computation indexes into time. This paper addresses the problem of identifying optimal linear schedules for uniform dependence algorithms so that their execution time is minimized. Procedures are proposed to solve this problem based on the mathematical solution of a nonlinear optimization problem. The complexity of these procedures is independent of the size of the algorithm. Actually, the complexity is exponential in the dimension of the index set of the algorithm and, for all practical purposes, very small due to the limited dimension of the index set of algorithms of practical interest. The results reported in this paper can be used to derive time-optimal systolic designs and applied in optimizing compilers to restructure programs at compile-time in order to maximally exploit available parallelism.

关键词： algorithm mapping DATA DEPENDENCY LINEAR SCHEDULE OPTIMIZING COMPILER NESTED-LOOP PROGRAM SYSTOLIC ARRAY TIME-OPTIMAL

来源：评论

学校读者我要写书评

暂无评论

DESIGN OF SPACE-OPTIMAL REGULAR ARRAYS FOR algorithmS WITH LINEAR SCHEDULES

引用

IEEE TRANSACTIONS ON COMPUTERS 1995年第5期44卷 683-694页

作者： TSAY, JC CHANG, PY Inst. of Comput. Sci. & Inf. Eng. Nat. Chiao Tung Univ. Hsinchu Taiwan

The problem of designing space-optimal 2D regular arrays for N x N x N cubical mesh algorithms with linear schedule ai + bj + ck, 1 less than or equal to a less than or equal to b less than or equal to c, and N = nc, is studied. Three novel nonlinear processor allocation methods, each of which works by combining a partitioning technique (gcd-partition) with different nonlinear processor allocation procedures (traces), are proposed to handle different cases, In cases where a + b less than or equal to c, which are dealt with by the first processor allocation method, space-optimal designs can always be obtained in which the number of processing elements is equal to N-2/c. For other cases where a + b > c and either a = b and b = c, two other optimal processor allocation methods are proposed. Besides, the closed form expressions for the optimal number of processing elements are derived for these cases.

关键词： algorithm mapping DATA DEPENDENCY LINEAR SCHEDULE MATRIX MULTIPLICATION OPTIMIZING COMPILER SPACE-OPTIMAL SYSTOLIC ARRAY

来源：评论

学校读者我要写书评

暂无评论

Compilation approach for coarse-grained reconfigurable architectures

引用

IEEE DESIGN & TEST OF COMPUTERS 2003年第1期20卷 26-33页

作者： Lee, JE Choi, K Dutt, ND Seoul Natl Univ Sch Elect Engn & Comp Sci Seoul 151742 South Korea Univ Calif Irvine Irvine CA USA

Coarse-grained reconfigurable architectures can enhance the performance of critical loops and computation-intensive functions. Such architectures need efficient compilation techniques to map algorithms onto customized architectural configurations. A new compilation approach uses a generic reconfigurable architecture to tackle the memory bottleneck that typically limits the performance of many applications.

关键词： Program Compilers Reconfigurable Architectures Memory Bottleneck Coarse Grained Reconfigurable Architectures Critical Loops Computation Intensive Functions Compilation Techniques algorithm mapping Customized Architectural Configurations Generic Reconfigurable Architecture Reconfigurable Architectures Microarchitecture Delay Registers Computer Architecture Design Methodology Hardware Space Exploration Architecture Description Languages Timing

来源：评论

学校读者我要写书评

暂无评论

Processor array design with FPGA area constraint

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1999年第3期18卷 253-264页

作者： Fernando, JA Jean, JSN Systran Fed Corp Dayton OH 45432 USA Wright State Univ Dept Comp Sci & Engn Dayton OH 45435 USA

Digital signal processing algorithms with multiple shift-invariant dependence graphs (DG's) can be mapped to field programmable gate array hardware in many different types of systolic processor arrays, Because of the finite amount of hardware resources, the problem is to use a "right" amount of hardware in a specific configuration so to maximize the processing speed. In this paper, the problem of finding the right processor array configuration is formulated as a constrained optimization problem where the cost function includes not only the cost of individual processor arrays but also the cost of interfacing circuits. Three heuristic algorithms are presented for the optimization problem, Among them, both the Lth axial neighbor algorithm and the simulated annealing algorithm produce good results on a test case. Simulation results on the test case also indicate that the initial configuration is important in getting a good configuration for both algorithms. The Lth axial neighbor algorithm has the extra advantage of requiring less amount of performance tuning.

关键词： algorithm mapping computer-aided design dependence graph field programmable gate array (FPGA) processor array systolic array

来源：评论

学校读者我要写书评

暂无评论

Optimal control of discrete-time nonlinear heterogeneous multi-agent systems via a distributed DISOPE algorithm

引用

OPTIMAL CONTROL APPLICATIONS & METHODS 2023年第1期44卷 148-169页

作者： Wang, Zhenhua Li, Junmin Xidian Univ Sch Math & Stat Xian Shaanxi Peoples R China Xianyang Normal Univ Sch Math & Stat Xianyang Shaanxi Peoples R China

A distributed offline DISOPE algorithm for optimal state synchronization of leader-follower systems with nonlinear discrete-time dynamics is considered, which integrates the model optimization idea and parameter estimation technique together. It can be seen that the convergent solutions of modified linear optimal control problems satisfy the optimality conditions of the original nonlinear optimization problem with non-LQ performance indices. The heterogeneous agents can cooperate and exchange information via network communication. Based on DISOPE algorithm, a distributed optimal control policy is obtained to assure state synchronization and minimize performance indices in finite time horizon. Finally, a simulation example is provided to illustrate the effectiveness of the distributed DISOPE algorithm.

关键词： algorithm mapping distributed DISOPE algorithm nonlinear discrete-time dynamics optimal state synchronization

来源：评论

学校读者我要写书评

暂无评论

A method for the on-line use of off-line derived remappings of iterative automatic target recognition tasks onto a particular class of heterogeneous parallel platforms

引用

JOURNAL OF SUPERCOMPUTING 1998年第4期12卷 387-406页

作者： Budenske, JR Ramanujan, RS Siegel, HJ Architecture Technol Corp Minneapolis MN 55424 USA Purdue Univ Sch Elect & Comp Engn Parallel Proc Lab W Lafayette IN 47907 USA

This study focuses on a particular application domain (iterative automatic target recognition tasks) and an associated specific class of dedicated heterogeneous parallel hardware platforms. For the computational environment considered, a methodology is presented for the on-line operating system to decide heuristically whether to perform a remapping of the application onto the platform based on information generated from input data by the application during execution. If the decision is to remap, the operating system will be able to select a mapping, which is appropriate for the given state of the application, from a stored set of mappings that were previously derived with an off-line heuristic.

关键词： algorithm mapping automatic target recognition genetic algorithms heterogeneous computing real-time processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：