检索结果-内蒙古大学图书馆

ieee DESIGN & TEST 2023年第6期40卷 30-38页

作者： Lu, Zhonghai KTH Royal Inst Technol Sch Elect Engn & Comp Sci Div Elect & Embedded Syst S-10044 Stockholm Sweden

Editor ’s notes: The author in this article advocates for Processing in NoC (PiN) as a means to actively engage a Network-on-Chip (NoC) in computation. The article highlights the benefits of utilizing the communication network for system-level performance enhancement, with a case study demonstrating its advantages over conventional passive NoC approaches. —Mahdi Nikdast, Colorado State University, USA —Miquel Moreto, Barcelona Supercomputing Center, Spain —Masoumeh (Azin) Ebrahimi, KTH Royal Institute of Technology, Sweden —Sujay Deb, IIIT Delhi, India

关键词： Pins computer architecture Routing protocols Coherence Liquid crystal on silicon Maintenance engineering Codes network on chip (NoC) processing in network-on-chip (PiN) in-NoC computing active NoC passive NoC many-core architecture

来源：评论

学校读者我要写书评

暂无评论

GoPIM: GCN-Oriented Pipeline Optimization for PIM Accelerators 31

GoPIM: GCN-Oriented Pipeline Optimization for PIM Accelerato...

引用

31st ieee international symposium on high performance computer architecture, HPCA 2025

作者： Yang, Siling He, Shuibing Wang, Wenjiong Yin, Yanlong Wu, Tong Chen, Weijian Zhang, Xuechen Sun, Xian-He Feng, Dan The State Key Laboratory of Blockchain and Data Security Zhejiang University China Zhejiang Lab China Institute of Blockchain and Data Security China Zhejiang Key Laboratory of Big Data Intelligent Computing China Washington State University Vancouver United States Illinois Institute of Technology United States Huazhong University of Science and Technology China Wuhan National Laboratory for Optoelectronics China

ISBN: (纸本)9798331506476

Graph convolutional networks (GCNs) are popular for a variety of graph learning tasks. ReRAM-based processing-in-memory (PIM) accelerators are promising to expedite GCN training owing to their in-situ computing capability. However, existing accelerators can be severely underutilized even with pipelines, due to the oversight of the skewed execution times of various GCN stages and the ignorance of skewed degrees of graph vertices. In this work, we propose GOPIM, a GCN-oriented pipeline optimization for PIM accelerators to expedite GCN training. First, GOPIM proposes an ML-based scheme that allocates crossbar resources to the most needed stages to streamline the overall pipeline. Second, GOPIM utilizes a selective vertex updating technique that evenly distributes vertices on crossbars by interleaved mapping. These techniques collectively reduce the overall execution time without losing much accuracy. We also provide a practical architecture design for GOPIM. Our experimental results show that, GoPIM achieves up to 191 × speedup and 16.1 × energy saving, compared to the state-of-the-art work. © 2025 ieee.

关键词： Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Adaptive Hybrid FFT: A Novel Pipeline and Memory-Based architecture for Radix-2k FFT in Large Size Processing 22

Adaptive Hybrid FFT: A Novel Pipeline and Memory-Based Archi...

引用

22nd ieee international symposium on Parallel and Distributed Processing with Applications, ISPA 2024

作者： Zhao, Fangyu Xiao, Chunhua Wang, Zhiguo Du, Xiaohua Dong, Bo Chongqing University College of Computer Science Chongqing China Ministry of Education Key Laboratory of Dependable Service Computing in Cyber Physical Society China Sichuan Huacun Zhigu Technology Co. Ltd. Chengdu China

ISBN: (纸本)9798331509712

In the field of digital signal processing, the fast Fourier transform (FFT) is a fundamental algorithm, with its processors being implemented using either the pipelined architecture, well-known for high-throughput applications but weak in hardware utilization, or the memory-based architecture, designed for area-constrained scenarios but failing to meet stringent throughput requirements. Therefore, we propose an adaptive hybrid FFT, which leverages the strengths of both pipelined and memory-based architectures. In this paper, we propose an adaptive hybrid FFT processor that combines the advantages of both architectures, and it has the following features. First, a set of radix-2kmulti-path delay commutators (MDC) units are developed to support high-performance large-size processing. Second, a conflict-free memory access scheme is formulated to ensure a continuous data flow without data contention. Third, We demonstrate the existence of a series of bit-dimension permutations for reordering input data, satisfying the generalized constraints of variable-length, high-radix, and any level of parallelism for wide adaptivity. Furthermore, the proposed FFT processor has been implemented on a field-programmable gate array (FPGA). As a result, the proposed work outperforms conventional memory-based FFT processors by requiring fewer computation cycles. It achieves higher hardware utilization than pipelined FFT architectures, making it suitable for highly demanding applications. © 2024 ieee.

关键词： Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

DAPPER: A performance-Attack-Resilient Tracker for RowHammer Defense 31

DAPPER: A Performance-Attack-Resilient Tracker for RowHammer...

引用

31st ieee international symposium on high performance computer architecture, HPCA 2025

作者： Woo, Jeonghyun Nair, Prashant J. The University of British Columbia Department of Electrical and Computer Engineering Canada

ISBN: (纸本)9798331506476

RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems, where rapid activation of DRAM rows can induce bit-flips in neighboring rows. To mitigate this, state-of-the-art host-side RowHammer mitigations typically rely on shared counters or tracking structures. While these optimizations benefit benign applications, they are vulnerable to performance Attacks (Perf-Attacks), where adversaries exploit shared structures to reduce DRAM bandwidth for co-running benign applications by increasing DRAM accesses for RowHammer counters or triggering repetitive refreshes required for the early reset of structures, significantly degrading performance. In this paper, we propose secure hashing mechanisms to thwart adversarial attempts to capture the mapping of shared structures. We propose DAPPER, a novel low-cost tracker resilient to Perf-Attacks even at ultra-low RowHammer thresholds. We first present a secure hashing template in the form of DAPPER-S. We then develop Dapper-H, an enhanced version of DapperS, incorporating double-hashing, novel reset strategies, and mitigative refresh techniques. Our security analysis demonstrates the effectiveness of DAPPER-H against both RowHammer and Perf-Attacks. Experiments with 57 workloads from SPEC2006, SPEC2017, TPC, Hadoop, MediaBench, and YCSB show that, even at an ultra-low RowHammer threshold of 500, DapperH incurs only a 0.9% slowdown in the presence of Perf-Attacks while using only 96 KB of SRAM per 32GB of DRAM memory. © 2025 ieee.

关键词： Static random access storage

来源：评论

学校读者我要写书评

暂无评论

SEFsim: A Statistically-Guided Fast DRAM Simulator

SEFsim: A Statistically-Guided Fast DRAM Simulator

引用

2024 ieee international symposium on performance Analysis of Systems and Software, ISPASS 2024

作者： Adak, Debpratim Lee, Hyokeun Feinberg, Ben Voskuilen, Gwendolyn Hughes, Clayton Zhou, Huiyang Awad, Amro North Carolina State University Department of Electrical and Computer Engineering United States Sandia National Laboratories United States

ISBN: (纸本)9798350376388

In academia and industry, computer architects rely heavily on performance models for design space exploration. However, performance models are now experiencing longer simulation times due to the increasing design complexity of modern computing systems. DDR memory, a critical component in a computing system, requires an accurate performance model to properly evaluate the instructions per cycle (IPC). However, a detailed DRAM simulator models each DDR event and, therefore, contributes a considerable simulation time. This paper proposes Satistically-guided Epoch-evolving Fixed-latency Simulator (SEFsim), an approximate and fast DRAM simulation model, to significantly improve the simulation speed. The key design principle of SEFsim is to statistically capture the performance model of DRAM using a large number of patterns, enabling the model to accurately predict the latency and behavior of new workloads. Based on our evaluation using a detailed memory model and 10 workloads, SEFsim captures the original model with 96.16% accuracy while speeding up the simulation by 10.3X and 8.25 % in the standalone and full system evaluations, respectively. © 2024 ieee.

关键词： Dynamic random access storage

来源：评论

学校读者我要写书评

暂无评论

AMMA: Adaptive Multimodal Assistants Through Automated State Tracking and User Model-Directed Guidance Planning 31

AMMA: Adaptive Multimodal Assistants Through Automated State...

引用

ieee Conference on Virtual Reality and 3D User Interfaces (VR)

作者： Yang, Jackie (Junrui) Qiu, Leping Corona-Moreno, Emmanuel Angel Shi, Louisa Bui, Hung Lam, Monica S. Landay, James A. Stanford Univ Stanford CA 94305 USA Univ Toronto Toronto ON Canada Univ Washington Seattle WA USA

ISBN: (纸本)9798350374025;9798350374032

Novel technologies such as augmented reality and computer perception lay the foundation for smart assistants that can guide us through real-world tasks, such as cooking or home repair. However, the nature of real-world interaction requires assistants that adapt to users' mistakes, environments, and communication preferences. We propose Adaptive Multimodal Assistants (AMMA), a software architecture for task guidance with generated adaptive interfaces from step-by-step instructions. This is achieved through 1) an automatically generated user action state tracker and 2) a guidance planner that leverages a continuously trained user model. The assistant also adjusts its guidance and communication delivery methods based on observed user performance as well as implicit and explicit user feedback. We demonstrated the viability of AMMA by building an adaptive cooking assistant running in a high-fidelity virtual reality-based simulator. A user study of the cooking assistant showed that AMMA can reduce the task completion time and the number of manual communication methods changes.

关键词： Augmented reality interface generation smart assistant Human-centered computing-Human computer interaction (HCI)-Interaction paradigms-Mixed / augmented reality Human-centered computing-Human computer interaction (HCI)-Interactive systems and tools-User interface toolkits

来源：评论

学校读者我要写书评

暂无评论

An FPGA-Based high-Throughput Dataflow Accelerator for Lightweight Neural Network

An FPGA-Based High-Throughput Dataflow Accelerator for Light...

引用

ieee international symposium on Circuits and Systems (ISCAS)

作者： Zhao, Zhiyuan Li, Jixing Chen, Gang Jiang, Zhelong Qiao, Ruixiu Xu, Peng Chen, Yihao Lu, Huaxiang Univ Sci & Technol China Sch Microelect Hefei Peoples R China Chinese Acad Sci Inst Semiconductors Beijing Peoples R China Univ Chinese Acad Sci Mat & Optoelect Res Ctr Beijing Peoples R China Univ Chinese Acad Sci Coll Microelect Beijing Peoples R China Semicond Neural Network Intelligent Percept & Com Beijing Peoples R China

ISBN: (纸本)9798350330991;9798350331004

Lightweight neural networks (LWNNs) have drawn significant attention recently for compact architecture and acceptable accuracy. Despite achieving substantial reductions in computation complexity and model size, increased memory access demands are caused by the extensive use of depthwise separable convolutions (DSCs) and skip-connection blocks (SCBs), which makes it difficult to achieve the anticipated performance. To process LWNNs efficiently, an FPGA-based dataflow accelerator is proposed in this paper. Firstly, a pixel-based streaming strategy is introduced to reduce off-chip memory access while minimizing on-chip memory overhead. Furthermore, an adaptive bandwidth computing engine (CE) is designed to increase computational efficiency in multi-CE architecture. Finally, based on the scalable CE, a dynamic parallelism allocation algorithm is proposed to avoid underutilization of on-chip computing resources. ShuffleNetV2 is implemented on Xilinx ZC706 platform, and the results show the proposed accelerator can achieve a state-of-the-art performance of 1771.2 FPS and computational efficiency of 0.64 GOPS/DSP, which is 5.3 x of the reference design. Index Terms Lightweight neural network(LWNN),

关键词： Lightweight neural network(LWNN) FPGA accelerator computing engine(CE) dataflow

来源：评论

学校读者我要写书评

暂无评论

Applying Computational Design Model for Contract-Based Design in Automatic Code Generation for Industrial Edge Applications 33

Applying Computational Design Model for Contract-Based Desig...

引用

33rd international symposium on Industrial Electronics (ISIE)

作者： Wang, Qiuyue Zhang, Yingyue Dai, Wenbin Shanghai Jiao Tong Univ Shanghai Peoples R China

ISBN: (纸本)9798350394085;9798350394092

In recent years, the convergence of the Industrial Internet and edge computing accelerates the evolution of edge computing towards edge intelligence. The new architecture of Industrial Internet and edge computing requires that industrial edge applications can handle hard real-time production tasks while satisfying the high-reliability demand of industrial sites. Traditional industrial software development cannot cope with such demands. In this paper, the computational design model for contract-based design is applied in automatic code generation for industrial edge applications to solve the above problems. The proposed method is mainly for iteration to improve the generation process from requirement to actual code. The intermediate model generated by the computational model is verified with the wind turbine generator system, a typical application of industrial edge computing systems. The paper provides an efficient and flexible solution for rapidly reconfiguring and optimizing the intermediate model in response to changing requirements, which contributes to the automatic code generation for industrial edge applications. Moreover, this approach can meet diverse system performance and maximize resource utilization to reduce costs significantly.

关键词： Computational Model Industrial Edge computing Optimization Model Modular-Based Design Automatic Code Generation

来源：评论

学校读者我要写书评

暂无评论

uSystolic: Byte-Crawling Unary Systolic Array 28

uSystolic: Byte-Crawling Unary Systolic Array

引用

28th Annual ieee international symposium on high-performance computer architecture (HPCA)

作者： Di Wu San Miguel, Joshua Univ Wisconsin Dept ECE Madison WI 53706 USA

ISBN: (纸本)9781665420273

General matrix multiply (GEMM) is an important operation in broad applications, especially the thriving deep neural networks. To achieve low power consumption for GEMM, researchers have already leveraged unary computing, which manipulates bitstreams with extremely simple logic. However, existing unary architectures are not well generalizable to varying GEMM configurations in versatile applications and incompatible to the binary computing stack, imposing challenges to execute unary GEMM effortlessly. In this work, we address the problem by architecting a hybrid unary-binary systolic array, uSystolic, to inherit the legacy-binary data scheduling with slow (thus power-efficient) data movement, i.e., data bytes are crawling out from memory to drive uSystolic. uSystolic exhibits tremendous area and power improvements as a joint effect of 1) low-power computing kernel, 2) spatial-temporal bitstream reuse, and 3) on-chip SRAM elimination. For the evaluated edge computing scenario, compared with the binary parallel design, the rated-coded uSystolic reduces the systolic array area and total on-chip area by 59.0% and 91.3%, with the on-chip energy and power efficiency improved by up to 112.2x and 44.8x for AlexNet.

关键词： Power demand Processor scheduling Neural networks Random access memory computer architecture Systolic arrays Hybrid power systems

来源：评论

学校读者我要写书评

暂无评论

QNTN: Establishing a Regional Quantum Network in Tennessee

QNTN: Establishing a Regional Quantum Network in Tennessee

引用

2024 Workshops of the international Conference for high performance computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Shaban, Mohamed Ismail, Muhammad Kiran, Mariam Computer Science Department TN United States Alexandria University Faculty of Education Department of Mathematics Egypt Oak Ridge National Laboratory TN United States

ISBN: (纸本)9798350355543

This paper investigates the design of a regional Quantum Network in Tennessee (QNTN) that will connect three quantum local area networks in different cities. We explore two approaches for achieving this interconnection: deploying a satellite constellation in the space layer and employing high-altitude platforms (HAPs) in the aerial layer. Our comparison reveals that a space-ground architecture that uses 108 satellites provides 55.17% coverage of the day and handles 57.75% of entanglement distribution requests with an average fidelity of 0.96. In contrast, the air-ground architecture delivers full-day coverage, fulfills 100% of requests, and achieves a higher average fidelity of 0.98. However, HAPs face significant challenges such as limited operational time, sensitivity to vibrations and weather conditions, and the need for continuous maintenance. This paper contributes to the understanding of optimal architecture for regional quantum networks, highlighting the trade-offs between satellite-based and air-ground approaches. © 2024 ieee.

关键词： entanglement distribution entanglement fidelity high-altitude platforms quantum communications Quantum Internet regional quantum networks satellites

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：