As Machine Learning applications increase the demand for optimised implementations in both embedded and high-end processing platforms, the industry and research community have been responding with different approaches...
详细信息
As Machine Learning applications increase the demand for optimised implementations in both embedded and high-end processing platforms, the industry and research community have been responding with different approaches to implement these solutions. This work presents approximations to arithmetic operations and mathematical functions that, associated with a customised adaptive artificial neural networks training method, based on RMSProp, provide reliable and efficient implementations of classifiers. The proposed solution does not rely on mixed operations with higher precision or complex rounding methods that are commonly applied. The intention of this work is not to find the optimal simplifications for specific deep learning problems but to present an optimised framework that can be used as reliably as one implemented with precise operations, standard training algorithms and the same network structures and hyper-parameters. By simplifying the 'half-precision' floating point format and approximating exponentiation and square root operations, the authors' work drastically reduces the field programmable gate array implementation complexity (e.g. -43 and -57% in two of the component resources). The reciprocal square root approximation is so simple it could be implemented only with combination logic. In a full software implementation for a mixed-precision platform, only two of the approximations compensate the processing overhead of precision conversions.
Polar codes are the forward error correcting (FEC) codes renowned for achieving channel capacity for various codeword lengths. A low-complexity decoder, termed a Successive Cancellation (SC) decoder, is commonly emplo...
详细信息
Polar codes are the forward error correcting (FEC) codes renowned for achieving channel capacity for various codeword lengths. A low-complexity decoder, termed a Successive Cancellation (SC) decoder, is commonly employed to decode polar codes. However, the SC decoder's sequential nature leads to a drawback in terms of decoding speed. This paper proposes an approximate successive cancellation decoder (ASCD), which incorporates approximate computing techniques that are equivalent alternatives to the exact computational units. The comparator, adder-subtractor block, is replaced by approximate units in the merged processing unit, and an approximate twobit processing unit is designed at the last stage of the decoder to reduce the hardware complexity and delay with negligible performance degradation. The overall design of the proposed ASCD is implemented targeting the Xilinx Virtex-6 FPGA platform. With the proposed approximate counterparts, the ASCD achieves an average throughput improvement of 68 % compared to the former decoders. In addition, the usage of overall hardware resources is reduced by 41 %, reducing the processing complexity. The proposed decoder proves beneficial for error-resilient applications in 5G wireless communications.
approximate computing techniques (ACTs) take advantage of resilience computing applications to trade off among output precision, area, power, and performance. ACTs can lead to significant gains at affordable costswhen...
详细信息
approximate computing techniques (ACTs) take advantage of resilience computing applications to trade off among output precision, area, power, and performance. ACTs can lead to significant gains at affordable costswhen efficiently implemented on Field Programmable Gate Array- (FPGA) based accelerators. Although several novel ACTs works have been proposed for FPGA accelerators, their applicability to high-assurance systems has not been explored as much. ACTs are becoming necessary in many critical Edge computing systems, such as self-driving cars and Earth observation satellites, to increase computational efficiency. However, an important question comes to mind when targeting critical systems: Does ACT optimization negatively affect the reliability of the system and how can one find optimal design architectures that blend classic mitigation techniques like Triple Modular Redundancy with approximation- and precise-based arithmetic hardware units to achieve the best possible computational efficiency without compromising dependability? This work aims to solve this research problem by introducing a Design Space Exploration (DSE) methodology that employs ACTs in arithmetic units of the design and identifies Pareto-optimal microarchitectures that balance all relevant gains of ACTs, such as area, speed, power, failure rate, and precision, by inserting the correct amount of approximation in the design. In a nutshell, our DSE methodology has formulated the DSE with a Multi-Objective Optimization Problem (MOP). Each Pareto-optimal solution of our tool finds which arithmetic units of the design to implement with precise and approximate circuits and which units to selectively triplicate to remove single points of failure that compromise system reliability below acceptable thresholds. We also suggest another formulation of the DSE into a Single-Objective constraint Optimization Problem (ScOP) producing a single optimal point, and that the user may demand, as a less time-consuming
As modern applications demand an unprecedented level of computational resources, traditional computing system design paradigms are no longer adequate to guarantee significant performance enhancement at an affordable c...
详细信息
As modern applications demand an unprecedented level of computational resources, traditional computing system design paradigms are no longer adequate to guarantee significant performance enhancement at an affordable cost. approximatecomputing (AxC) has been introduced as a potential candidate to achieve better computational performances by relaxing non-critical functional system specifications. In this article, we propose a systematic and high-abstraction-level approach allowing the automatic generation of near Pareto-optimal approximate configurations for a Discrete Cosine Transform (DCT) hardware accelerator. We obtain the approximate variants by using approximate operations, having configurable approximation degree, rather than full-precise ones. We use a genetic searching algorithm to find the appropriate tuning of the approximation degree, leading to optimal tradeoffs between accuracy and gains. Finally, to evaluate the actual HW gains, we synthesize non-dominated approximate DCT variants for two different target technologies, namely, Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). Experimental results show that the proposed approach allows performing a meaningful exploration of the design space to find the best tradeoffs in a reasonable time. Indeed, compared to the state-of-the-art work on approximate DCT, the proposed approach allows an 18% average energy improvement while providing at the same time image quality improvement.
暂无评论