Convolutional Neural Network (CNN) is used in many real-world applications due to its high accuracy. The rapid growth of modern applications based on learning algorithms has increased the importance of efficient imple...
详细信息
Convolutional Neural Network (CNN) is used in many real-world applications due to its high accuracy. The rapid growth of modern applications based on learning algorithms has increased the importance of efficient implementation of CNNs. The array-type architecture is a well-known platform for the efficient implementation of CNN models, which takes advantage of parallel computation and data reuse. However, accelerators suffer from restricted hardware resources, whereas CNNs involve considerable communication and computation load. Furthermore, since accelerators execute CNN layer by layer, different shapes and sizes of layers lead to suboptimal resource utilization. This problem prevents the accelerator from reaching maximum performance. The increasing scale and complexity of deep learning applications exacerbate this problem. Therefore, the performance of CNN models depends on the hardware's ability to adapt to different shapes of different layers to increase resource utilization. This work proposes a reconfigurable accelerator that can efficiently execute a wide range of CNNs. The proposed flexible and low-cost reconfigurable interconnect units allow the array to perform CNN faster than fixed-size implementations (by 45.9% for ResNet-18 compared to the baseline). The proposed architecture also reduces the on-chip memory access rate by 36.5% without compromising accuracy.
Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by diff...
详细信息
Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.
Matrix multiplication is used as an example to illustrate a method of transforming the specification of a problem into an algorithm suitable for execution on synchronous machines. The transformations are influenced bo...
详细信息
Matrix multiplication is used as an example to illustrate a method of transforming the specification of a problem into an algorithm suitable for execution on synchronous machines. The transformations are influenced both by the architectures of the target machines and their available high level languages. Three different synchronous machines, a conceptual MCC (mesh connected computer), the Cray-1 and the ICL DAP are considered as target hardware.
AbstractThis paper describes, in an informal manner, the programming language ACTUS which was designed to facilitate programming array processing and vector processing ‘supercomputers’.ACTUS extends the program stru...
详细信息
AbstractThis paper describes, in an informal manner, the programming language ACTUS which was designed to facilitate programming array processing and vector processing ‘supercomputers’.ACTUS extends the program structuring and data structuring facilities of Pascal to the synchronous parallel environment as represented by array and vector processor architectures. A knowledge of Pascal is assumed and only the parallel features of ACTUS are descri
暂无评论