检索结果-内蒙古大学图书馆

parallel processor FOR REAL-TIME ANALYSIS OF BINARY IMAGES

ELECTRONICS LETTERS 1986年第22期22卷 1179-1181页

作者： OWCZARCZYK, J Institute of Computer Science PKiN Polish Academy of Sciences Warsaw Poland

In the letter a parallel processor for real-time analysis of binary images is presented. Its processing throughput exceeds 109 cellular logic operations per second.

关键词： Computer architecture parallel processor binary images parallel machines Optical information, image and video signal processing Digital signal processing Pattern recognition real-time analysis Information theory cellular logic operations Multiprocessing systems computerised picture processing

来源：评论

学校读者我要写书评

暂无评论

The circuits and robust design methodology of the massively parallel processor based on the matrix architecture

引用

IEEE JOURNAL OF SOLID-STATE CIRCUITS 2007年第4期42卷 804-812页

作者： Noda, Hideyuki Tanizaki, Tetsushi Gyohten, Takayuki Dosaka, Katsumi Nakajima, Masami Mizumoto, Katsuya Yoshida, Kanako Iwao, Takenobu Nishijima, Tetsu Okuno, Yoshihiro Arimoto, Kazutami Renesas Technol Corp Itami Hyogo 6640005 Japan

Novel circuits and design methodology of the massively parallel processor based on the matrix architecture are introduced. A fine-grained processing elements (PE) circuit for high-throughput MAC operations based on the Booth's algorithm enhances the performance of a 16-bit fixed-point signed MAC, which operates up to 30.0 GOPS/W. The dedicated I/O interface circuits are designed for converting the direction of data access and supporting the interleaved memory architecture, and they are implemented for maximizing the processor core efficiency. Power management techniques for suppressing current peaks and reducing average power consumption are proposed to enhance the robustness of the macro. The circuits and the design methodology proposal in this paper are attractive for achieving a high performance and robust massively parallel SIMD processor core employed in multimedia SoCs.

关键词： CMOS integrated circuits low power memory parallel processor SIMD

来源：评论

学校读者我要写书评

暂无评论

SHIFT-NET AND POWER SHIFT-NET FOR parallel processor SYSTEMS

引用

parallel COMPUTING 1994年第7期20卷 1027-1039页

作者： MAKINO, T Department of Information Sciences Toho University 2-2-1 Miyama Funabashi 274 Japan

A new class of interconnection network, called shift-net, is presented. This network is realized using a large-scale shifter, operating in a manner of SIMD. The shifter is implemented by log N stage select switches that are N two-input selectors at each stage, where N is the number of processor elements. The ease in control and simple hardware structure of shift-net permit the implementation of highly parallel processor systems. The arbitrary distance access between processor elements eases the limitation for numerical applications. For simpler networks, power shift-net is proposed. This network restricts the access pattern to two's power, like the hyper-cube network.

关键词： parallel processor INTERCONNECTION NETWORK SHIFTER PERMUTATION SIMD

来源：评论

学校读者我要写书评

暂无评论

DECOMPOSED LOAD-FLOW ALGORITHM SUITABLE FOR parallel processor IMPLEMENTATION

引用

IEE PROCEEDINGS-C GENERATION TRANSMISSION AND DISTRIBUTION 1985年第6期132卷 281-284页

作者： RAFIAN, M STERLING, MJH IRVING, MR Department of Engineering Science Laboratories University of Durham Durham UK

The paper presents a new method for load-flow analysis which is particularly appropriate for very large power systems. The objective has been to reduce the computation time for the analysis of a given system by tearing the network into a number of independent subsystems. The subsystem programs may be executed in parallel, resulting in a considerable time saving for on-line system control. The main advantage of the new algorithm is that the computation efficiency of the main or co-ordinating program is significantly improved. Results are presented which indicate that, for average to very large systems, the net solution time using the suggested technique is less than half of that required by a centralised decoupled method to achieve the same accuracy.

关键词： parallel processor load flow load-flow analysis Power systems power system analysis computing Power engineering computing large power systems Multiprocessing systems computation efficiency parallel processing

来源：评论

学校读者我要写书评

暂无评论

VIPER-II, AN IMPROVED parallel processor FOR REAL-TIME CALCULATION OF INNER PRODUCTS

引用

MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING 1987年第2期25卷 228-231页

作者： VANALSTE, JA MULDER, AJ Biomedical Engineering Division Department of Electrical Engineering Twente University of Technology Enschede The Netherlands

In biomedical engineering computing time is often a well known limitation in real-time signal processing, especially when microcomputers are involved. Because of the relatively slow execution of the integer multiply instruction, the inner product is very time consuming. Furthermore, the data manipulations needed for the inner product calculation take a considerable amount of time. To overcome this bottleneck we have developed a special-purpose processor for the fast calculations of inner products. The apparatus is called VIPER-II. VIPER-II calculates a number of inner products of vectors consisting of 16-bit integer-valued arrays of which the length may vary from 64 to 512 elements.

关键词： Digital filtering Inner products parallel processor Q-bus Signal processing

来源：评论

学校读者我要写书评

暂无评论

EFFICIENT ADDRESS GENERATION IN A parallel processor

引用

INFORMATION PROCESSING LETTERS 1990年第3期36卷 111-116页

作者： LEE, DL Department of Computer Science York University North York Ont. Canada M3J 1P3

We show that it is sufficient to use O(log N ) logic gates to generate N memory addresses simultaneously in O(1) time for a memory system consisting of N memory modules which allows parallel, conflict-free access to any of the rows, columns, forward and backward diagonals of an N x N matrix, where N = 2 n with any n \s>1. This substantially improves the previously known result that requires O( N log N ) logic gates and O(log N ) time for parallel generation of memory addresses.

关键词： parallel processor parallel memory system memory address generation

来源：评论

学校读者我要写书评

暂无评论

An Efficient parallel processor for Dense Tensor Computation

引用

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 2021年第7期29卷 1335-1347页

作者： Huang, Wei-Pei Cheung, Ray C. C. Yan, Hong City Univ Hong Kong Dept Elect Engn Hong Kong Peoples R China

Nowadays, many data are multidimensional, which are called tensors. Tensor computations have been applied in different fields and various software libraries have been developed. However, not much attention has been received for developing a hardware architecture to accelerate the tensor computations. In this article, an efficient and unified processing element (PE) array for the 3-D tensor computation is demonstrated. Our PE array is optimized for thin and tall tensor-matrix multiplication and two types of tensor times matrices chain (TTMc) operations. Our design is evaluated in three study cases and compared with the state-of-the-art design. By using computation partition and rearrangement, data movement between the field-programmable gate array (FPGA) and off-chip DDR memory can be reduced by O(I-2), where I is the maximum range among all the dimensions of the data tensor. For TTMc implementation, clock frequency has been increased by 18% compared with the state-of-the-art implementation on the same FPGA chip. An experiment on 3-D volumetric data set rendering by tensor approximation method is conducted for demonstration. For the bricks reconstruction process, the runtime decreased by 50%, i.e., two times faster, on our FPGA implementation compared with that running on GPU. In CANDECOMP/PARAFAC decomposition, for one iteration, the runtime has been decreased by up to 93% compared with the programs implemented by Tensorly, which is a python library.

关键词： Field-programmable gate array (FPGA) hardware architecture parallel processor tensor computation

来源：评论

学校读者我要写书评

暂无评论

A 125 GOPS 583 mW Network-on-Chip Based parallel processor With Bio-Inspired Visual Attention Engine

引用

IEEE JOURNAL OF SOLID-STATE CIRCUITS 2009年第1期44卷 136-147页

作者： Kim, Kwanho Lee, Seungjin Kim, Joo-Young Kim, Minsu Yoo, Hoi-Jun Korea Adv Inst Sci & Technol Dept Comp Sci & Elect Engn Div Elect Engn Taejon 305701 South Korea

A network-on-chip (NoC) based parallel processor is presented for bio-inspired real-time object recognition with visual attention algorithm. It contains an ARM10-compatible 32-bit main processor, 8 single-instruction multiple-data (SIMD) clusters with 8 processing elements in each cluster, a cellular neural network based visual attention engine (VAE), a matching accelerator, and a DMA-like external interface. The VAE with 2-D shift register array finds salient objects on the entire image rapidly. Then, the parallel processor performs further detailed image processing within only the pre-selected attention regions. The low-latency NoC employs dual channel, adaptive switching and packet-based power man- agement, providing 76.8 GB/s aggregated bandwidth. The 36 mm(2) chip contains 1.9 M gates and 226 kB SRAM in a 0.13 mu m 8-metal CMOS technology. The fabricated chip achieves a peak performance of 125 GOPS and 22 frames/sec object recognition while dissipating 583 mW at 1.2 V.

关键词： Matching accelerator network-on-chip (NoC) object recognition parallel processor processing element clusters visual attention engine

来源：评论

学校读者我要写书评

暂无评论

A parallel processor for Distributed Genetic Algorithm with Redundant Binary Number

A Parallel Processor for Distributed Genetic Algorithm with ...

引用

6th International Conference on New Trends in Information Science, Service Science and Data Mining (ISSDM)

作者： Kamimura, Tomohiro Kanasugi, Akinori Tokyo Denki Univ Dept Elect Tokyo 101 Japan

ISBN: (纸本)9781467308762

Genetic algorithm (GA) is one of optimization algorithm based on an idea for evolution of life. GA can be applied various combination optimization problem. This paper proposes a parallel processor for distributed genetic algorithm (DGA) with redundant binary number. Since a redundant binary number has redundancy, solution expression becomes variegated. For this reason, it is expected the algorithm easily find the optimized solution, and the error rates decrease. Since DGA is a parallel algorithm, the performance can be improved by using a specified parallel processor. The effectiveness of the proposed processor was confirmed by some simulations and experiments using FPGA circuit board.

关键词： component parallel processor distributed GA redundant binary number

来源：评论

学校读者我要写书评

暂无评论

The circuits and robust design methodology of the massively parallel processor based on the matrix architecture

The circuits and robust design methodology of the massively ...

引用

Symposium on VLSI Circuits

ISBN: (纸本)1424400066

关键词： CMOS integrated circuits low power memory parallel processor SIMD

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：