The multi-modal artificial intelligence (MAI) has attracted significant interest due to its capability to process and integrate data from multiple modalities, including images, text, and audio. Addressing MAI tasks in...
详细信息
The multi-modal artificial intelligence (MAI) has attracted significant interest due to its capability to process and integrate data from multiple modalities, including images, text, and audio. Addressing MAI tasks in distributed systems necessitate robust and efficient architectures. The Transformer architecture has emerged as a primary network in this context. The integration of Vision Transformers (ViTs) within multimodal frameworks is crucial for enhancing the processing and comprehension of image data across diverse modalities. However, the complex architecture of ViTs and the extensive resources required for processing large-scale image data pose high computational and storage demands. These demands are particularly challenging for deploying ViTs on edge devices within distributed frameworks. To address this issue, we propose a novel dynamic redundancy-aware ViT accelerator based on parallel computing, termed DRViT. DRViT is supported by an algorithm and architectureco- design. We first propose a hardware-friendly lightweight algorithm featuring token merging, token pruning, and an INT8 quantization scheme. Then, we design a specialized architecture to support this algorithm, transforming the lightweight algorithm into significant latency and energy-efficiency improvements. Our design is implemented on the Xilinx Alveo U250, achieving an overall inference latency of 0.86 ms and 1.17 ms per image for ViT-tiny at 140 MHz and 100 MHz, respectively. The throughput can reach 1,380 GOP/s at peak, demonstrating superior performance compared to state-of-the-art accelerators, even at lower frequencies.
Graph neural networks (GNNs) are a promising method for learning graph representations and demonstrate remarkable performance on various graph-related tasks. Existing typical GNNs exploit the neighborhood message pass...
详细信息
Graph neural networks (GNNs) are a promising method for learning graph representations and demonstrate remarkable performance on various graph-related tasks. Existing typical GNNs exploit the neighborhood message passing scheme that subtly aggregates feature messages from neighbor nodes to update the node representations. Despite the effectiveness of this scheme, its complex computational model heavily relies on the graph structure, which hinders their scaling to realistic large-scale graph applications. Although several custom accelerators have been proposed to speed up GNNs, these hardware-specific optimization techniques fail to address the fundamental problem of high computational complexity in GNNs. To tackle this challenge, we propose a dedicated algorithm-architectureco-design framework, dubbed MePa, which aims to improve GNN execution efficiency by coordinating algorithm- and hardware-level innovations. Specifically, with an in-depth analysis of GNN message-passing algorithms and potential speedup opportunities, we first propose an efficient message-passing algorithm that can dynamically prune task-irrelevant graph data at multiple granularity, including channel/edge/node-wise. This approach significantly reduces the overall complexity of GNN with negligible accuracy loss. A novel architecture is designed to support dynamic pruning and translate it into performance improvements. Elaborate pipelines and specialized optimizations jointly improve performance and decrease energy consumption. compared to the state-of-the-art GNN accelerator AWB-GCN, MePa achieves on average 1.95x speedups and 2.6x energy efficiency.
We propose an architecture exploration scheme for QoS control Silicon Intellectual Properties (SIPs). The scheme integrates network simulator (NS-2), embedded Linux operating system, Linux driver, Electronic design Au...
详细信息
ISBN:
(纸本)9781612843490
We propose an architecture exploration scheme for QoS control Silicon Intellectual Properties (SIPs). The scheme integrates network simulator (NS-2), embedded Linux operating system, Linux driver, Electronic design Automation (EDA) tools, extended on-chip bus, and real Field Programmable Gate Array (FPGA) hardware for comprehension of how data bus width and clock rate affect QoS performance in wireless networks. Software and hardware co-design flow and programming paradigm for evolving the FPGA prototype toward a platform-based reusable SIP with driver and cross-layer interface are also depicted. Three QoS control experiments in video streaming over HCCA, EDCA, and WiMAX wireless technologies demonstrate the proposed exploration scheme effectively helps engineers comprehend impacts of architectural design parameters on networking performances in wireless multimedia communications.
This paper presents a new spatio-temporal motion estimation algorithm and its VLSI architecture for video coding based on algorithm and architecture co-design methodology. The algorithmconsists of the new strategies ...
详细信息
This paper presents a new spatio-temporal motion estimation algorithm and its VLSI architecture for video coding based on algorithm and architecture co-design methodology. The algorithmconsists of the new strategies of spatio-temporal motion vector prediction, modified one-at-a-time search scheme, and multiple update paths derived from optimization theory. The hardware specification is for high-definition video coding. We applied the ME algorithm to H.264 reference software. Our algorithm surpasses recently published research and achieves close performance to full search. The VLSI implementation proves the low cost feature of our algorithm. The algorithm and architecture co-designconcept is highly emphasized in this paper. We provide some quantitative example to show the necessity of algorithm and architecture co-design.
暂无评论