检索结果-内蒙古大学图书馆

Efficient Hardware architecture for Sparse coding

IEEE TRANSACTIONS ON SIGNAL PROCESSING 2014年第16期62卷 4173-4186页

作者： Kim, Jung Kuk Knag, Phil Chen, Thomas Zhang, Zhengya Univ Michigan Dept Elect Engn & Comp Sci Ann Arbor MI 48109 USA

Sparse coding encodes natural stimuli using a small number of basis functions known as receptive fields. In this work, we design custom hardware architectures for efficient and high-performance implementations of a sparse coding algorithm called the sparse and independent local network (SAILnet). A study of the neuron spiking dynamics uncovers important design considerations involving the neural network size, target firing rate, and neuron update step size. Optimal tuning of these parameters keeps the neuron spikes sparse and random to achieve the best image fidelity. We investigate practical hardware architectures for SAILnet: a bus architecture that provides efficient neuron communications, but results in spike collisions;and a ring architecture that is more scalable, but causes neuron misfires. We show that the spike collision rate is reduced with a sparse spiking neural network, so an arbitration-free bus architecture can be designed to tolerate collisions without the need of arbitration. To reduce neuron misfires, we design a latent ring architecture to damp the neuron responses for an improved image fidelity. The bus and the ring architecture can be combined in a hybrid architecture to achieve both high throughput and scalability. The three architectures are synthesized and place-and-routed in a 65 nm CMOS technology. The proof-of-concept designs demonstrate a high sparse coding throughput up to 952 M pixels per second at an energy consumption of 0.486 nJ per pixel.

关键词： algorithm and architecture co-optimization hardware acceleration neural network architecture sparse and independent local network sparse coding

来源：评论

学校读者我要写书评

暂无评论

Forward-Projection architecture for Fast Iterative Image Reconstruction in X-Ray CT

引用

IEEE TRANSACTIONS ON SIGNAL PROCESSING 2012年第10期60卷 5508-5518页

作者： Kim, Jung Kuk Fessler, Jeffrey A. Zhang, Zhengya Univ Michigan Dept Elect & Comp Engn Ann Arbor MI 48109 USA

Iterative image reconstruction can dramatically improve the image quality in X-ray computed tomography (CT), but the computation involves iterative steps of 3D forward- and back-projection, which impedes routine clinical use. To accelerate forward-projection, we analyze the CT geometry to identify the intrinsic parallelism and data access sequence for a highly parallel hardware architecture. To improve the efficiency of this architecture, we propose a water-filling buffer to remove pipeline stalls, and an out-of-order sectored processing to reduce the off-chip memory access by up to three orders of magnitude. We make a floating-point to fixed-point conversion based on numerical simulations and demonstrate comparable image quality at a much lower implementation cost. As a proof of concept, a 5-stage fully pipelined, 55-way parallel separable-footprint forward-projector is prototyped on a Xilinx Virtex-5 FPGA for a throughput of 925.8 million voxel projections/s at 200 MHz clock frequency, 4.6 times higher than an optimized 16-threaded program running on an 8-core 2.8-GHz CPU. A similar architecture can be applied to back-projection for a complete iterative image reconstruction system. The proposed algorithm and architecture can also be applied to hardware platforms such as graphics processing unit and digital signal processor to achieve significant accelerations.

关键词： algorithm and architecture co-optimization hardware acceleration iterative image reconstruction separable footprint projection X-ray computed tomography

来源：评论

学校读者我要写书评

暂无评论

Toward Always-On Mobile Object Detection: Energy Versus Performance Tradeoffs for Embedded HOG Feature Extraction

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2018年第5期28卷 1102-1115页

作者： Omid-Zohoor, Alex Young, Christopher Ta, David Murmann, Boris Stanford Univ Dept Elect Engn Stanford CA 94305 USA Stanford Univ Dept Comp Sci Stanford CA 94305 USA

This paper studies the effects of front-end imager parameters on object detection performance and energy consumption. A custom version of histograms of oriented gradient (HOG) features based on 2-b pixel ratios is presented and shown to achieve superior object detection performance for the same estimated energy compared with conventional HOG features. A front-end hardware implementation capable of extracting these features at multiple scales is proposed, and a system-level energy analysis is performed. This energy analysis suggests a potential 19x reduction in I/O energy and a 3.3x reduction in back-end detection energy compared with conventional object detection pipelines.

关键词： algorithm and architecture co-optimization always on fast algorithms feature extraction and description hardware/software codesign histogram of oriented gradients (HOGs) low-power circuits and architectures mobile systems object class detection and recognition people detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：