Low-profile mobile computing platforms often need to execute a variety of machinelearningalgorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level...
详细信息
Low-profile mobile computing platforms often need to execute a variety of machinelearningalgorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate analog in-memory computing coprocessor for accelerating general machinelearningapplications by exploiting analog register file cache. The instruction-level acceleration offers true programmability beyond the degree of freedom provided by reconfigurable machinelearning accelerators, and also allows the code generation stage of a compiler back-end to control the coprocessor execution and data flow, so that applications do not need high-level machinelearning software frameworks with a large memory footprint. Conventional analog and mixed-signal accelerators suffer from the overhead of frequent data conversion between analog and digital signals. To solve this classical problem, Coara uses an analog register file cache, which interfaces the analog in-memory computing coprocessor with the digital register file of the processor core. As a result, more than 90% of data conversion overhead with ADC and DAC can be eliminated by temporarily storing the result of analog computation in a switched-capacitor analog memory cell until data dependency occurs. Cycle-accurate Verilog RTL model of the proposed architecture is evaluated with 45 nm CMOS technology parameters while executing machinelearning benchmark computation codes that are generated by a customized cross-compiler without using machinelearning software frameworks.
Today's cameras are designed to approximate what they observe in a manner that preserves entropy. However, time critical autonomous applications such as autonomous driving, surveillance and defense systems require...
详细信息
Today's cameras are designed to approximate what they observe in a manner that preserves entropy. However, time critical autonomous applications such as autonomous driving, surveillance and defense systems require task critical information at the highest quality. With rapid advances in frame rates and resolutions, observing scenes at the highest quality raises concerns for the transmission bandwidth. In this paper, we introduce a new paradigm of smart camera that captures only task-critical information at the highest quality. Embedded deep neural network (DNN) algorithms within the camera enhance quality of information through real-time control of sensor parameters. We show the hardware feasibility of this camera by demonstrating a 3D-stacked architecture with a Digital Pixel Sensor (DPS). We demonstrate a number of high-level vision applications that benefit through this task-guided control including object detection, object tracking and activity recognition. Finally, we present the unique challenges faced created as a result of feedback and show how software/hardware innovations can be used to mitigate them.
Low-profile mobile computing platforms often need to execute a variety of machinelearningalgorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level...
详细信息
Low-profile mobile computing platforms often need to execute a variety of machinelearningalgorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate analog in-memory computing coprocessor for accelerating general machinelearningapplications by exploiting analog register file cache. The instruction-level acceleration offers true programmability beyond the degree of freedom provided by reconfigurable machinelearning accelerators, and also allows the code generation stage of a compiler back-end to control the coprocessor execution and data flow, so that applications do not need high-level machinelearning software frameworks with a large memory footprint. Conventional analog and mixed-signal accelerators suffer from the overhead of frequent data conversion between analog and digital signals. To solve this classical problem, Coara uses an analog register file cache, which interfaces the analog in-memory computing coprocessor with the digital register file of the processor core. As a result, more than 90% of data conversion overhead with ADC and DAC can be eliminated by temporarily storing the result of analog computation in a switched-capacitor analog memory cell until data dependency occurs. Cycle-accurate Verilog RTL model of the proposed architecture is evaluated with 45 nm CMOS technology parameters while executing machinelearning benchmark computation codes that are generated by a customized cross-compiler without using machinelearning software frameworks.
Today's cameras are designed to approximate what they observe in a manner that preserves entropy. However, time critical autonomous applications such as autonomous driving, surveillance and defense systems require...
详细信息
Today's cameras are designed to approximate what they observe in a manner that preserves entropy. However, time critical autonomous applications such as autonomous driving, surveillance and defense systems require task critical information at the highest quality. With rapid advances in frame rates and resolutions, observing scenes at the highest quality raises concerns for the transmission bandwidth. In this paper, we introduce a new paradigm of smart camera that captures only task-critical information at the highest quality. Embedded deep neural network (DNN) algorithms within the camera enhance quality of information through real-time control of sensor parameters. We show the hardware feasibility of this camera by demonstrating a 3D-stacked architecture with a Digital Pixel Sensor (DPS). We demonstrate a number of high-level vision applications that benefit through this task-guided control including object detection, object tracking and activity recognition. Finally, we present the unique challenges faced created as a result of feedback and show how software/hardware innovations can be used to mitigate them.
暂无评论