检索结果-内蒙古大学图书馆

DRViT: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on FPGA

JOURNAL OF PARALLEL AND DISTRIBUTED coMPUTING 2025年 199卷

作者： Sun, Xiangfeng Zhang, Yuanting Wang, Qinyu Zou, Xiaofeng Liu, Yujia Zeng, Ziqian Zhuang, Huiping South China Univ Technol Guangzhou 511442 Peoples R China Guangdong Univ Technol Guangzhou 510006 Peoples R China

The multi-modal artificial intelligence (MAI) has attracted significant interest due to its capability to process and integrate data from multiple modalities, including images, text, and audio. Addressing MAI tasks in distributed systems necessitate robust and efficient architectures. The Transformer architecture has emerged as a primary network in this context. The integration of Vision Transformers (ViTs) within multimodal frameworks is crucial for enhancing the processing and comprehension of image data across diverse modalities. However, the complex architecture of ViTs and the extensive resources required for processing large-scale image data pose high computational and storage demands. These demands are particularly challenging for deploying ViTs on edge devices within distributed frameworks. To address this issue, we propose a novel dynamic redundancy-aware ViT accelerator based on parallel computing, termed DRViT. DRViT is supported by an algorithm and architecture co- design. We first propose a hardware-friendly lightweight algorithm featuring token merging, token pruning, and an INT8 quantization scheme. Then, we design a specialized architecture to support this algorithm, transforming the lightweight algorithm into significant latency and energy-efficiency improvements. Our design is implemented on the Xilinx Alveo U250, achieving an overall inference latency of 0.86 ms and 1.17 ms per image for ViT-tiny at 140 MHz and 100 MHz, respectively. The throughput can reach 1,380 GOP/s at peak, demonstrating superior performance compared to state-of-the-art accelerators, even at lower frequencies.

关键词： Multi-modal artificial intelligence (MAI) Vision Transformer (ViT) Field-programmable gate array (FPGA) Lightweight algorithm algorithm and architecture co-design

来源：评论

学校读者我要写书评

暂无评论

Efficient Message Passing algorithm and architecture co-design for Graph Neural Networks

引用

IEEE TRANSACTIONS ON EMERGING TOPICS IN coMPUTATIONAL INTELLIGENCE 2025年第1期9卷 889-903页

作者： Zou, Xiaofeng Chen, Cen Zhang, Luochuan Li, Shengyang Zhou, Joey Tianyi Wei, Wei Li, Kenli South China Univ Technol Sch Future Technol Guangzhou 510641 Guangdong Peoples R China Hunan Univ Shenzhen Inst Changsha 518055 Peoples R China Pazhou Lab Guangzhou 510330 Peoples R China Inst High Performance Comp Singapore 138632 Singapore Xian Univ Technol Sch Comp Sci & Engn Xian 710049 Peoples R China Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China

Graph neural networks (GNNs) are a promising method for learning graph representations and demonstrate remarkable performance on various graph-related tasks. Existing typical GNNs exploit the neighborhood message passing scheme that subtly aggregates feature messages from neighbor nodes to update the node representations. Despite the effectiveness of this scheme, its complex computational model heavily relies on the graph structure, which hinders their scaling to realistic large-scale graph applications. Although several custom accelerators have been proposed to speed up GNNs, these hardware-specific optimization techniques fail to address the fundamental problem of high computational complexity in GNNs. To tackle this challenge, we propose a dedicated algorithm-architecture co-design framework, dubbed MePa, which aims to improve GNN execution efficiency by coordinating algorithm- and hardware-level innovations. Specifically, with an in-depth analysis of GNN message-passing algorithms and potential speedup opportunities, we first propose an efficient message-passing algorithm that can dynamically prune task-irrelevant graph data at multiple granularity, including channel/edge/node-wise. This approach significantly reduces the overall complexity of GNN with negligible accuracy loss. A novel architecture is designed to support dynamic pruning and translate it into performance improvements. Elaborate pipelines and specialized optimizations jointly improve performance and decrease energy consumption. compared to the state-of-the-art GNN accelerator AWB-GCN, MePa achieves on average 1.95x speedups and 2.6x energy efficiency.

关键词： Task analysis Accuracy Graph neural networks computational modeling Message passing computer architecture Termination of employment algorithm and architecture co-design graph neural networks dynamic pruning

来源：评论

学校读者我要写书评

暂无评论

architecture EXPLORATION OF QOS coNTROL SILIcoN INTELLECTUAL PROPERTIES FOR CROSS-LAYER designS IN WIRELESS NETWORKS

ARCHITECTURE EXPLORATION OF QOS CONTROL SILICON INTELLECTUAL...

引用

IEEE International conference on Multimedia and Expo (ICME)

作者： Chen, Chao-Lieh Chuo, Li-Shin Cheng, Wu-Liang Wu, Chun-Ching Lai, Chien-Hao Natl Kaohsiung First Univ Sci & Technol Dept EE Kaohsiung Taiwan

ISBN: (纸本)9781612843490

We propose an architecture exploration scheme for QoS control Silicon Intellectual Properties (SIPs). The scheme integrates network simulator (NS-2), embedded Linux operating system, Linux driver, Electronic design Automation (EDA) tools, extended on-chip bus, and real Field Programmable Gate Array (FPGA) hardware for comprehension of how data bus width and clock rate affect QoS performance in wireless networks. Software and hardware co-design flow and programming paradigm for evolving the FPGA prototype toward a platform-based reusable SIP with driver and cross-layer interface are also depicted. Three QoS control experiments in video streaming over HCCA, EDCA, and WiMAX wireless technologies demonstrate the proposed exploration scheme effectively helps engineers comprehend impacts of architectural design parameters on networking performances in wireless multimedia communications.

关键词： architecture exploration silicon intellectual property QoS algorithm and architecture co-design

来源：评论

学校读者我要写书评

暂无评论

algorithm/architecture co-design of 3-D spatio-temporal motion estimation for video coding

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2007年第3期9卷 455-465页

作者： Lee, Gwo Giun (Chris) Wang, Ming-Jiun Lin, He-Yuan Su, Drew Wei-Chi Lin, Bo-Yun Natl Cheng Kung Univ Dept Elect Engn Media SoC Lab Tainan 70101 Taiwan

This paper presents a new spatio-temporal motion estimation algorithm and its VLSI architecture for video coding based on algorithm and architecture co-design methodology. The algorithm consists of the new strategies of spatio-temporal motion vector prediction, modified one-at-a-time search scheme, and multiple update paths derived from optimization theory. The hardware specification is for high-definition video coding. We applied the ME algorithm to H.264 reference software. Our algorithm surpasses recently published research and achieves close performance to full search. The VLSI implementation proves the low cost feature of our algorithm. The algorithm and architecture co-design concept is highly emphasized in this paper. We provide some quantitative example to show the necessity of algorithm and architecture co-design.

关键词： algorithm and architecture co-design motion estimation spatio-temporal prediction VLSI architecture

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：