检索结果-内蒙古大学图书馆

An Energy-Efficient Domain-Specific reconfigurable array processor With Heterogeneous PEs for Wearable Brain-Computer Interface SoCs

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2022年第12期69卷 4872-4885页

作者： Byun, Wooseok Je, Minkyu Kim, Ji-Hoon SAPEON Korea Inc Architecture Team Seongnam 13486 South Korea Korea Adv Inst Sci & Technol Sch Elect Engn Daejeon 34141 South Korea Ewha Womans Univ Dept Elect & Elect Engn Smart Factory Multidisciplinary Program Seoul 03760 South Korea

Recently, there is increasing demand for energy-efficient signal processing in wearable visual-stimuli-based brain-computer interface (V-BCI) devices. For the better accuracy and the reduced latency of the V-BCI system, the target identification (TI) algorithm that analyzes brain signals is being advanced, and the importance of an energy-efficient accelerating chip that processes various linear algebra operations constituting the TI algorithms is growing. In this paper, we propose a domain-specific reconfigurable array processor (RAP) with a dynamically reconfigurable and scalable array including 5-heterogeneous processing elements (PEs) for the energy-efficient acceleration of basic linear algebra subprograms (BLAS) and matrix decompositions. The system-on-chip (SoC), including the proposed RAP, was fabricated in 130-nm CMOS technology with an area of 16.87-mm(2) and measured at 1.0 V 90 MHz. The RAP achieved an information transfer rate (ITR) of 139.9-bits/min and a TI accuracy of 95.4% on a fabricated chip through an optimized TI algorithm and scalable array processing. In addition, the RAP has 16.8x higher TI energy efficiency than prior work and achieved an energy efficiency of 2144.2-bits/min/mW for information transfer processing rate with the proposed TI algorithm. The RAP supports a greater variety of linear algebra operations and data sizes with hardware reconfiguration than the prior accelerators.

关键词： Brain-computer interface (BCI) domain-specific architecture heterogeneous PE linear algebra accelerator reconfigurable array processor

来源：评论

学校读者我要写书评

暂无评论

RDMM: Runtime dynamic migration mechanism of distributed cache for reconfigurable array processor

引用

INTEGRATION-THE VLSI JOURNAL 2020年 72卷 82-91页

作者： Jiang, Lin Liu, Yang Shan, Rui Feng, Yani Zhang, Yuan Xie, Xiaoyan Xian Univ Sci & Technol Integrated Circuit Design Lab Xian 710054 Peoples R China Xian Univ Posts & Telecommun Sch Comp Xian 710121 Peoples R China Xian Univ Posts & Telecommun Sch Elect Engn Xian 710121 Peoples R China

reconfigurable array processors have emerged as powerful solution to speed up computationally intensive applications. However, they may suffer from a data access bottleneck as the frequency of memory access rises. At present, the distributed cache design in the reconfigurable array processor has a large cache failure rate, and the frequent access to external memory leads to a long delay in memory access. To mitigate this problem, we present a Runtime Dynamically Migration Mechanism (RDMM) of distributed cache for reconfigurable array processor based on the feature of obvious locality and high parallelism in accessing data. This mechanism allows temporary, static data to be dynamically scheduled to migrate data with a high access frequency from the remote cache to the processor's local migration storage table based on how often the reconfigurable array processors access the remote cache. We can accurately get the data on the shortest path by way of data search strategy based on migration storage tables, thereby effectively reducing the access delay of the entire system, increasing the memory bandwidth of the reconfigurable array processor. We leverage the hardware platform of reconfigurable array processor to test the proposed mechanism. The experimental results show that RDMM reduces access delay by up to 35.24% compared with the tradition distributed cache at the highest conflict rate. And compared with the Ref.[19], Ref.[20], Ref.[21] and Ref.[23], the working frequency can be increased by 15%, the hit rate can be increased by 6.1%, and the peak bandwidth can be increased by about 3x.

关键词： reconfigurable array processor Distributed cache Access frequency Runtime dynamic migration mechanism

来源：评论

学校读者我要写书评

暂无评论

An IDE for reconfigurable Video array processor

An IDE for Reconfigurable Video Array Processor

引用

Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)

作者： Yang, Rong Xie, Xiaoyan Chai, Miaomiao Fang, Lin He, Wanqi Sun, Jingtao Xian Univ Posts & Telecommun Xian Peoples R China

ISBN: (纸本)9789881476890

Integrated development environment (IDE) is one of the key points to construct software ecological of reconfigurable array processor (RAP) chips. However, transplanting from conventional IDE is a daunting task, because of the complexity of high-level behavior description in front-end and special spatial-temporal instructions bind with hardware, such as branch prediction, out-of-order execution, SIMD parallelism. Therefore, we propose a hierarchical IDE design method. At the front-end, the static back slicing is introduced to deconstruct the abstract semantics of high-level language (HLLs) into relatively fixed operations and simple structure. So that the spatial-temporal features are easy to be peel out. At the bottom, the machine instruction sets are encapsulated into instruction groups (IGs) . The semantic abstraction level of hardware description is enhanced. Physical hardware details are separated from the Intermediate Representation ( IR) , the scalability is brought out. Finally, an IDE is developed by this method, for high efficiency video coding (HEVC) algorithm mapping. The testing results show that the efficiency of algorithm development is greatly improved while maintaining the same coding quality.

关键词： IDE static back slicing instruction groups reconfigurable array processor HEVC

来源：评论

学校读者我要写书评

暂无评论

Design of reconfigurable array processor for multimedia application

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2018年第3期77卷 3639-3657页

作者： Yun, Zhu Jiang, Lin Wang, Shuai Huang, Xingjie Song, Hui Li, Xueting Xidian Univ Sch Microelect Xian 710071 Shaanxi Peoples R China Xian Univ Posts & Telecommun Sch Elect Engn Xian 710121 Shaanxi Peoples R China Northeastern Univ Coll Comp & Informat Sci Boston MA 02115 USA Xian Univ Posts & Telecommun Sch Comp Sci Xian 710121 Shaanxi Peoples R China

With the rapid growth of the amount of computations and power consumption, there is a pressing need for a high power-efficiency architecture, which takes account of computational efficiency and flexibility of application. This paper proposes a type of array-processor architecture for multimedia application which is programmable and self-reconfigurable and consists of 1024 thin-core processing elements (PE). The performance and power dissipation are demonstrated with different multimedia application algorithms such as hash, and fractional motion estimation (FME). The results show that the proposed architecture can provide high performance with less energy consumption using parallel computation.

关键词： reconfigurable array processor Multimedia retrieval Hash Fractionalmotion estimation (FME)

来源：评论

学校读者我要写书评

暂无评论

BAR:a branch-alternation-resorting algorithm for locality exploration in graph processing

引用

High Technology Letters 2024年第1期30卷 31-42页

作者：邓军勇 WANG Junjie JIANG Lin XIE Xiaoyan ZHOU Kai School of Electronic Engineering Xi'an University of Posts and TelecommunicationsXi'an 710121P.R.China School of Computer Xi'an University of Science and TechnologyXi'an 710054P.R.China School of Computer Xi'an University of Posts and TelecommunicationsXi'an 710121P.R.China

Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph *** paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-resorting algorithm(BAR).In order to make the algorithm run in parallel and improve the efficiency of algorithm operation,the BAR algorithm is mapped onto the reconfigurable array processor(APR-16)to achieve vertex reordering,effectively improving the locality of graph *** paper validates the BAR algorithm on the GraphBIG framework,by utilizing the reordered dataset with BAR on breadth-first search(BFS),single source shortest paht(SSSP)and betweenness centrality(BC)algorithms for *** results show that compared with DBR and Corder algorithms,BAR can reduce execution time by up to 33.00%,and 51.00%*** terms of data movement,the BAR algorithm has a maximum reduction of 39.00%compared with the DBR algorithm and 29.66%compared with Corder *** terms of computational complexity,the BAR algorithm has a maximum reduction of 32.56%compared with DBR algorithm and53.05%compared with Corder algorithm.

关键词： graph processing vertex reordering branch-alternation-resorting algorithm(BAR) reconfigurable array processor

来源：评论

学校读者我要写书评

暂无评论

Design and implementation of near-memory computing array architecture based on shared buffer

引用

High Technology Letters 2022年第4期28卷 345-353页

作者： SHAN Rui GAO Xu FENG Yani HUI Chao CUI Xinyue CHAI Miaomiao School of Electronic Engineering Xi’an University of Posts and TelecommunicationsXi’an 710121P.R.China School of Computer Xi’an University of Posts and TelecommunicationsXi’an 710121P.R.China

Deep learning algorithms have been widely used in computer vision,natural language processing and other ***,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long *** order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main *** data reuse,the processing speed of the algorithm is further *** proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)*** experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.

关键词： near-memory computing shared buffer reconfigurable array processor convolutional neural network(CNN)

来源：评论

学校读者我要写书评

暂无评论

A simplified hardware-friendly contour prediction algorithm in 3D-HEVC and parallelization design

引用

High Technology Letters 2022年第4期28卷 392-400页

作者： JIANG Lin DUAN Xueyao XIE Xiaoyan College of Safety Science and Engineering Xi’an University of Science and TechnologyXi’an 710054P.R.China Laboratory of Integrated Circuit Design Xi’an University of Science and TechnologyXi’an 710054P.R.China School of Computer Xi’an University of Posts and TelecommunicationsXi’an 710121P.R.China

After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be *** reduce the computational complexity of DMM-4,a simplified hardware-friendly contour prediction algorithm is proposed in this *** on the similarity between texture and depth map,the proposed algorithm directly codes depth blocks to calculate edge regions to reduce the number of reference *** the verification of the test sequence on HTM16.1,the proposed algorithm coding time is reduced by 9.42%compared with the original *** avoid the time consuming of serial coding on HTM,a parallelization design of the proposed algorithm based on reconfigurable array processor(DPR-CODEC)is *** parallelization design reduces the storage access time,configuration time and saves the storage *** with the Xilinx Virtex 6 FPGA,experimental results show that parallelization design is capable of processing HD 1080p at a speed above 30 frames per *** with the related work,the scheme reduces the LUTs by 42.3%,the REG by 85.5%and the hardware resources by 66.7%.The data loading speedup ratio of parallel scheme can reach *** average,the different sized templates serial/parallel speedup ratio of encoding time can reach 2.446.

关键词： depth modeling mode 4(DMM-4) contour prediction 3D high efficiency video coding(3D-HEVC) parallelization reconfigurable array processor

来源：评论

学校读者我要写书评

暂无评论

A reconfigurable implement of the Chroma Intra-frame linear Prediction Algorithm 23

A Reconfigurable implement of the Chroma Intra-frame linear ...

引用

Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

作者： Xiaoyan Xie Yue He Yun Zhu Wenting Yang Yuxin Chen Xi'an University of Posts & Telecommunications China

ISBN: (纸本)9798400707674

Although the cross-component linear model used in H.266 chroma coding can increase the coding efficiency, it also introduces the issue of high complexity. To address this problem, this paper studies the relationship between the texture complexity of Coding Units (CU) and the coding modes of other CUs that are spatially adjacent to them. The article proposes a fast linear pattern prediction algorithm based on adjacent block coding patterns and texture complexity. First, data from experiments are used to determine the association between texture complexity and intra-frame prediction pattern decision. Second, the current encoding pattern is decided by analyzing the encoding patterns adjacent to the current space. Finally, a parallel implementation strategy for DPRAP-based chroma linear intra-frame prediction mapping is developed. According to the testing results, the optimized method has an execution time reduction of roughly 26.3% when compared to the VTM-9.0 standard algorithm.

关键词： chroma intra-frame prediction linear model prediction algorithm parallel mapping reconfigurable array processor

来源：评论

学校读者我要写书评

暂无评论

Architecture of SystemC Based Emulator for ReMAP

Architecture of SystemC Based Emulator for ReMAP

引用

2009 IEEE 8th International Conference on ASIC(ASICON 2009)

作者： Lai Wei,Peng Dai,Xin'an Wang*,Yanliang Liu

A SystemC based emulator architecture optimized for reconfigurable array processor is *** novel architecture,the emulator support performance evaluation,rtl co-simulation as well as system software *** emulator has been used in the development of a reconfigurable multimedia application processor (ReMAP) which has been successfully fabricated under SMIC 0.8um processing.

关键词： Emulator reconfigurable array processor System Modeling SystemC Co-simulation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：