检索结果-内蒙古大学图书馆

International Symposium on Low Power Electronics and Design

作者： Lanuzza, M Margala, M Corsonello, P Univ Calabria DEIS I-87036 Arcavacata Di Rende Italy

ISBN: (纸本)1595931376

Multimedia applications have become a dominant computing workload for computer systems as well as for wireless-based devices. Due to their repetitive computing and memory intensive nature, they can take effective advantage from processor-in-memory (PIM) technology. In this paper, a new low-power PIM-based 32-bit reconfigurable datapath optimized for multimedia applications is presented. The new circuit efficiently performs parallel arithmetic operations on either 8-, 16-, or 32-bit integer data or on 32-bit single precision floating-point data. As a result, high flexibility is provided at a very low hardware cost. When implemented using the UMC 0. 18 mu m 1. 8 V CMOS technology, the proposed datapath exhibits a 285 MHz running frequency, dissipates just 0.12 mW/MHz and occupies a silicon area of only 107,323 mu m(2). When performing 2D-DCT, proposed architecture consumes 74% less power and is 28% more power efficient compared to top-of-the-line commercial TI DSP.

关键词： processor-in-memory datapath reconfigurable computing

来源：评论

学校读者我要写书评

暂无评论

Critical Block Scheduling: A thread-level parallelizing mechanism for a heterogeneous Chip Multiprocessor architecture

Critical Block Scheduling: A thread-level parallelizing mech...

引用

20th International Workshop on Languages and Compilers for Parallel Computing

作者： Chu, Slo-Li Chung Yuan Christian Univ Dept Informat & Comp Engn Chungli Taiwan

ISBN: (纸本)9783540852605

processor-in-memory (PIM) architectures are developed for high-performance computing by integrating processing units with memory blocks into a single chip to reduce the performance gap between the processor and the memory. The PIM architecture combines heterogeneous processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a novel mechanism must be developed to identify their capabilities and dispatch the appropriate tasks to these heterogeneous processing elements. Accordingly, this paper presents a novel parallelizing mechanism, called Critical Block Scheduling to fully utilize all of the heterogeneous processors in the PIM architecture. Integrated with our thread-level parallelizing system, Octans, this mechanism decomposes the original program into blocks, produces corresponding dependence graph, creates a feasible execution schedule, and generates corresponding threads for the host and memory processors. The proposed Critical Block Scheduling not only can parallelize programs for PIM architectures but also can apply on other Multi-processor System-on-Chip (MPSoC) and Chip Multiprocessor (CMP) architectures which consist of multiple heterogeneous processors. The experimental results of real benchmarks are also discussed.

关键词： Chip Multiprocessor (CMP) processor-in-memory Critical Block Scheduling Octans

来源：评论

学校读者我要写书评

暂无评论

SAGE: an automatic analyzing system for a new high-performance SoC architecture - processor-in-memory

引用

JOURNAL OF SYSTEMS ARCHITECTURE 2004年第1期50卷 1-15页

作者： Chu, SL Huang, TC Natl Sun Yat Sen Univ Dept Elect Engn Kaohsiung 804 Taiwan Natl Chung Yuan Univ Dept Informat & Comp Engn Chungli 320 Taiwan

Continuous improvements in semiconductor fabrication density are supporting new classes of System-on-a-Chip (SoC) architectures that combine extensive processing logic/processor with high-density memory. Such architectures are generally called processor-in-memory (PIM) or Intelligent memory (I-RAM) and can support high-performance computing by reducing the performance gap between the processor and the memory. The PIM architecture combines various processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a novel strategy must be developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit them fully. Accordingly, this study presents an automatic source-to-source parallelizing system, called statement-analysis-grouping-evaluation (SAGE), to exploit the advantages of PIM architectures. Unlike conventional iteration-based parallelizing systems, SAGE adopts statement-based analyzing approaches. This study addresses the configuration of a PIM architecture with one host processor (i.e., the main processor in state-of-the-art computer systems) and one memory processor (i.e., the computing logic integrated with the memory). The strategy of the SAGE system, in which the original program is decomposed into blocks and a feasible execution schedule is produced for the host and memory processors, is investigated as well. The experimental results for real benchmarks are also discussed. (C) 2003 Elsevier B.V. All rights reserved.

关键词： SoC processor-in-memory statement analysis SAGE

来源：评论

学校读者我要写书评

暂无评论

Toward to utilize the heterogeneous multiple processors of the Chip Multiprocessor architecture

Toward to utilize the heterogeneous multiple processors of t...

引用

IFIP International Conference Embedded and Ubiquitous Computing

作者： Chu, Slo-Li Chung Yuan Christian Univ Dept Informat & Comp Engn Chungli Taiwan

ISBN: (纸本)9783540770916

Continuous improvements in semiconductor fabrication density are supporting new classes of Chip Multiprocessor (CMP) architectures that combine extensive processing logic/processor with high-density memory in a single chip. One of the architecture, called processor-in-memory (PIM) can support high-performance computing by combining various processors in a single system. Therefore, a new strategy is developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit them fully. This paper presents a novel scheduling mechanism, called Swing Scheduling to fully utilize all of the heterogeneous processors in the PIM architecture. Integrated with our Octans system, this mechanism can decompose the original program into blocks and can produce a feasible execution schedule for the host and memory processors, even for other CMP architectures. The experimental results for real benchmarks are also proposed.

关键词： Chip Multiprocessor (CMP) processor-in-memory Swing Scheduling Octans

来源：评论

学校读者我要写书评

暂无评论

十亿晶体管处理器体系结构研究

引用

计算机工程与科学 2007年第7期29卷 80-84页

作者：温璞杨学军国防科技大学计算机学院湖南长沙410073

半导体工艺技术的飞速发展促使单芯片内集成有更多的晶体管资源。如何利用丰富的片上资源,已成为处理器体系结构研究的一个重点。本文综述了目前关于十亿晶体管处理器结构的研究现状,认为在缓解当前处理器面临的存储墙问题、功耗问题、... 详细信息

半导体工艺技术的飞速发展促使单芯片内集成有更多的晶体管资源。如何利用丰富的片上资源,已成为处理器体系结构研究的一个重点。本文综述了目前关于十亿晶体管处理器结构的研究现状,认为在缓解当前处理器面临的存储墙问题、功耗问题、线延迟问题以及充分利用片上资源等方面,PIM结构是一种有效的途径,而与向量结构相结合则更能体现PIM结构的高带宽、低延迟优势。

关键词：十亿晶体管结构存储墙向量处理 processor-in-memory

来源：评论

学校读者我要写书评

暂无评论

A statement based parallelizing framework for processor-in-memory architectures

引用

INFORMATION PROCESSING LETTERS 2003年第3期85卷 159-163页

作者： Huang, TC Chu, SL Natl Sun Yat Sen Univ Dept Elect Engn Kaohsiung Taiwan

It is widely known that current memory architecture is one of the bottlenecksfor high-performance computers due to the increasing gap between the processor speed and memorylatency. For this reason, several architectures, called intelligent memory (IRAM) orprocessor-in-memory (PIM), have been studied in recent years aiming to integrate the processor andmemory together. A merit of PIM architecture is that the PIM chips can be used to replace the mainmemory chips in a workstation and act as coprocessors when main processor spawns them. This approachhas been adopted by Active Page, DIVA, and FlexRAM, among others. This class of architecturesprovides a hierarchical hybrid multiprocessor environment: host (main) processors and memoryprocessors. Host processor is more powerful with a deep cache hierarchies and higher latency toaccess memory. By contrast, memory processors are usually less powerful but with a lower latency inmemory access. The major problems we address in this paper are: how to dispatch suitable tasks tothese different processors in PIM by their computing power and characteristics to reduce their idletime, and how to partition the original program then execute simultaneously on these heterogeneousprocessors mixture. Based on our earlier work, we propose the SAGE(Statement-Analysis-Grouping-Evaluation) system to analyze the source program, generate a WeightPartition Dependence Graph (WPG), determine the weight of each block, and then dispatch the mostsuitable jobs to the host and memory processors, respectively. From the experiment, we find thatquite good speedup is obtained, which even exceeds the computation capability ratio in 1-host and1-memory processors environment.

关键词： processor-in-memory statement analysis SAGE parallelizing compiler FlexRAM scheduling

来源：评论

学校读者我要写书评

暂无评论

An efficient parallel architecture for implementing LST decoding in MIMO systems

引用

IEEE TRANSACTIONS ON SIGNAL PROCESSING 2006年第10期54卷 3899-3907页

作者： Alimohammad, Amirhossein Cockburn, Bruce F. Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2V4 Canada

Recovering the symbols in a multiple-input multiple-output (MIMO) receiver is a computationally intensive process. The layered space-time (LST) algorithms provide a reasonable tradeoff between complexity and performance. Commercial digital signal processors (DSPs) have become a key component in many high-volume products such as cellular telephones. As an alternative to power-hungry DSPs, we propose to use a moderately parallel single-instruction stream, multiple-data stream (SIMD) coprocessor architecture, called DSP-RAM, to implement an LST MIMO receiver that offers high performance with relatively low power consumption. For a, typical indoor wireless environment, a 100-MHz DSP-RAM can potentially provide more than ten times greater decoding throughput at the receiver of a (4,4) MIMO system compared with a conventional 720-MHz DSP. The DSP-RAM processor has been coded in a hardware description language (HDL) and synthesized for both available field-programmable gate arrays (FPGAs) and for a 0.18-mu m CM,OS standard cell implementation.

关键词： layered space-time decoding multiple-input multiple-output (MIMO) receiver parallel processing processor-in-memory

来源：评论

学校读者我要写书评

暂无评论

高性能并行PIM系统中Parcels通信机制研究

引用

小型微型计算机系统 2006年第3期27卷 554-557页

作者：温璞杨学军唐玉华国防科学技术大学计算机学院湖南长沙410073

基于processor-in-memory(PIM)技术的高性能并行系统具有可扩展性、自适应性、鲁棒性和低功耗等特性.借助于Parcels通信机制,并行PIM系统可以实现消息驱动的计算,重叠计算与通信,降低通信系统对细粒度并行应用的影响,可充分利用PIM的内... 详细信息

基于processor-in-memory(PIM)技术的高性能并行系统具有可扩展性、自适应性、鲁棒性和低功耗等特性.借助于Parcels通信机制,并行PIM系统可以实现消息驱动的计算,重叠计算与通信,降低通信系统对细粒度并行应用的影响,可充分利用PIM的内部带宽和应用局部性.文章对并行PIM系统中采用的Parcels通信机制及其特点、Parcels通信模型,以及Parcels的典型应用系统进行了着重介绍,对存在的问题进行了分析并指出了进一步的研究方向.

关键词： processor-in-memory Parcels通信机制用户层通信并行系统

来源：评论

学校读者我要写书评

暂无评论

V-PPIM:基于V-PIM的高性能PIM并行系统

V-PPIM:基于V-PIM的高性能PIM并行系统

引用

第14届全国信息存储技术学术会议

作者：温璞杨学军晏小波邓宇唐玉华国防科学技术大学计算机学院湖南410073

processor-in-memory(PIM)结构把高密度DRAM存储器和CMOS处理逻辑集成在一个芯片上,具有高带宽、低延迟特点.基于PIM技术的高性能并行系统具有更好的可扩展性、自适应性、鲁棒性和低功耗等特性,有望成为未来构建超千万亿次计算系统的基... 详细信息

processor-in-memory(PIM)结构把高密度DRAM存储器和CMOS处理逻辑集成在一个芯片上,具有高带宽、低延迟特点.基于PIM技术的高性能并行系统具有更好的可扩展性、自适应性、鲁棒性和低功耗等特性,有望成为未来构建超千万亿次计算系统的基石之一.向量处理技术可充分发挥PIM的结构优势,结合向量处理和PIM的结构特点提出基于向量PIM结构的V-PPIM并行系统,描述了V-PPIM及其处理元--基于向量的PIM(Vector-based PIM, V-PIM)结构及设计思想,讨论了V-PIM的关键特点并指出了进一步的研究方向.

关键词： processor-in-memory 向量处理并行系统存储墙

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel implementation of motion estimation on the Computational RAM architecture

Efficient parallel implementation of motion estimation on th...

引用

IEEE Canadian Conference on Electrical and Computer Engineering

作者： Ai, H Li, N Li, T Mandal, MK Cockburn, BF Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2V4 Canada

ISBN: (纸本)0780375149

Motion estimation is the most computationally intensive task in present video compression standards. Parallel processing has proved to be an efficient approach for similar kinds of applications. In this paper, we propose two parallel implementations of block-based motion estimation for a massively-parallel, processor-in-memory hardware architecture known as Computational RAM (C-RAM). Our simulation study showed that, although the massive parallelism of C-RAM does potentially have great benefits, the use of embedded DRAM and bit-serial arithmetic reduced the achievable speed-up to about 4 compared to 733 MHz Pentium III machine.

关键词： motion estimation parallel processing processor-in-memory logic-enhanced memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：