Editor ’s notes: The author in this article advocates for Processing in NoC (PiN) as a means to actively engage a Network-on-Chip (NoC) in computation. The article highlights the benefits of utilizing the communicati...
详细信息
Editor ’s notes: The author in this article advocates for Processing in NoC (PiN) as a means to actively engage a Network-on-Chip (NoC) in computation. The article highlights the benefits of utilizing the communication network for system-level performance enhancement, with a case study demonstrating its advantages over conventional passive NoC approaches. —Mahdi Nikdast, Colorado State University, USA —Miquel Moreto, Barcelona Supercomputing Center, Spain —Masoumeh (Azin) Ebrahimi, KTH Royal Institute of Technology, Sweden —Sujay Deb, IIIT Delhi, India
Graph convolutional networks (GCNs) are popular for a variety of graph learning tasks. ReRAM-based processing-in-memory (PIM) accelerators are promising to expedite GCN training owing to their in-situ computing capabi...
详细信息
In the field of digital signal processing, the fast Fourier transform (FFT) is a fundamental algorithm, with its processors being implemented using either the pipelined architecture, well-known for high-throughput app...
详细信息
RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems, where rapid activation of DRAM rows can induce bit-flips in neighboring rows. To mitigate this, state-of-the-art host-side RowHammer mi...
详细信息
In academia and industry, computer architects rely heavily on performance models for design space exploration. However, performance models are now experiencing longer simulation times due to the increasing design comp...
详细信息
Novel technologies such as augmented reality and computer perception lay the foundation for smart assistants that can guide us through real-world tasks, such as cooking or home repair. However, the nature of real-worl...
详细信息
ISBN:
(纸本)9798350374025;9798350374032
Novel technologies such as augmented reality and computer perception lay the foundation for smart assistants that can guide us through real-world tasks, such as cooking or home repair. However, the nature of real-world interaction requires assistants that adapt to users' mistakes, environments, and communication preferences. We propose Adaptive Multimodal Assistants (AMMA), a software architecture for task guidance with generated adaptive interfaces from step-by-step instructions. This is achieved through 1) an automatically generated user action state tracker and 2) a guidance planner that leverages a continuously trained user model. The assistant also adjusts its guidance and communication delivery methods based on observed user performance as well as implicit and explicit user feedback. We demonstrated the viability of AMMA by building an adaptive cooking assistant running in a high-fidelity virtual reality-based simulator. A user study of the cooking assistant showed that AMMA can reduce the task completion time and the number of manual communication methods changes.
Lightweight neural networks (LWNNs) have drawn significant attention recently for compact architecture and acceptable accuracy. Despite achieving substantial reductions in computation complexity and model size, increa...
详细信息
ISBN:
(纸本)9798350330991;9798350331004
Lightweight neural networks (LWNNs) have drawn significant attention recently for compact architecture and acceptable accuracy. Despite achieving substantial reductions in computation complexity and model size, increased memory access demands are caused by the extensive use of depthwise separable convolutions (DSCs) and skip-connection blocks (SCBs), which makes it difficult to achieve the anticipated performance. To process LWNNs efficiently, an FPGA-based dataflow accelerator is proposed in this paper. Firstly, a pixel-based streaming strategy is introduced to reduce off-chip memory access while minimizing on-chip memory overhead. Furthermore, an adaptive bandwidth computing engine (CE) is designed to increase computational efficiency in multi-CE architecture. Finally, based on the scalable CE, a dynamic parallelism allocation algorithm is proposed to avoid underutilization of on-chip computing resources. ShuffleNetV2 is implemented on Xilinx ZC706 platform, and the results show the proposed accelerator can achieve a state-of-the-art performance of 1771.2 FPS and computational efficiency of 0.64 GOPS/DSP, which is 5.3 x of the reference design. Index Terms Lightweight neural network(LWNN),
In recent years, the convergence of the Industrial Internet and edge computing accelerates the evolution of edge computing towards edge intelligence. The new architecture of Industrial Internet and edge computing requ...
详细信息
ISBN:
(纸本)9798350394085;9798350394092
In recent years, the convergence of the Industrial Internet and edge computing accelerates the evolution of edge computing towards edge intelligence. The new architecture of Industrial Internet and edge computing requires that industrial edge applications can handle hard real-time production tasks while satisfying the high-reliability demand of industrial sites. Traditional industrial software development cannot cope with such demands. In this paper, the computational design model for contract-based design is applied in automatic code generation for industrial edge applications to solve the above problems. The proposed method is mainly for iteration to improve the generation process from requirement to actual code. The intermediate model generated by the computational model is verified with the wind turbine generator system, a typical application of industrial edge computing systems. The paper provides an efficient and flexible solution for rapidly reconfiguring and optimizing the intermediate model in response to changing requirements, which contributes to the automatic code generation for industrial edge applications. Moreover, this approach can meet diverse system performance and maximize resource utilization to reduce costs significantly.
General matrix multiply (GEMM) is an important operation in broad applications, especially the thriving deep neural networks. To achieve low power consumption for GEMM, researchers have already leveraged unary computi...
详细信息
ISBN:
(纸本)9781665420273
General matrix multiply (GEMM) is an important operation in broad applications, especially the thriving deep neural networks. To achieve low power consumption for GEMM, researchers have already leveraged unary computing, which manipulates bitstreams with extremely simple logic. However, existing unary architectures are not well generalizable to varying GEMM configurations in versatile applications and incompatible to the binary computing stack, imposing challenges to execute unary GEMM effortlessly. In this work, we address the problem by architecting a hybrid unary-binary systolic array, uSystolic, to inherit the legacy-binary data scheduling with slow (thus power-efficient) data movement, i.e., data bytes are crawling out from memory to drive uSystolic. uSystolic exhibits tremendous area and power improvements as a joint effect of 1) low-power computing kernel, 2) spatial-temporal bitstream reuse, and 3) on-chip SRAM elimination. For the evaluated edge computing scenario, compared with the binary parallel design, the rated-coded uSystolic reduces the systolic array area and total on-chip area by 59.0% and 91.3%, with the on-chip energy and power efficiency improved by up to 112.2x and 44.8x for AlexNet.
This paper investigates the design of a regional Quantum Network in Tennessee (QNTN) that will connect three quantum local area networks in different cities. We explore two approaches for achieving this interconnectio...
详细信息
暂无评论