A fast Fourier transform (FFT) algorithm is mapped onto a suggested processing element topology in order to demonstrate the utility of the systolic data flow machine (SDFM) approach. The SDFM is based on the partition...
详细信息
A fast Fourier transform (FFT) algorithm is mapped onto a suggested processing element topology in order to demonstrate the utility of the systolic data flow machine (SDFM) approach. The SDFM is based on the partitioning of dataflow programs (graphs) into subgraphs that are small enough that they can be loaded into programmable systolic arrays, called processing elements. Mapability and performance criteria are suggested, such as the number of allocated primitive processors (in a systolic array) and the number of primitive processors and systolic arrays that are active at any one time. Conclusions about system attributes, such as the ratio of local to global communication, granularity, instruction execution and communication time, parallelism, processor utilization, are also presented.< >
Volume rendering is one of the most important visualization methods for unstructured grid data. However, the existing serial volume rendering algorithms for unstructured meshes are not efficient enough to meet the nee...
详细信息
ISBN:
(数字)9798350368208
ISBN:
(纸本)9798350368215
Volume rendering is one of the most important visualization methods for unstructured grid data. However, the existing serial volume rendering algorithms for unstructured meshes are not efficient enough to meet the needs of large-scale data volume visualization. In order to solve the performance bottleneck, this paper proposes a parallel volume rendering algorithm. Firstly, we design a parallel KD-tree algorithm to split volume data and track the transparent relationship. Secondly, each process uses an independent visual pipeline to calculate the volume rendering image. Finally, a tree synthesis strategy is used to synthesize the final image. Experimental results show that the algorithm can be efficiently applied to the visualization of large-scale unstructured grid data.
High-level synthesis (HLS) is a popular method that allows designers to describe the behavior-level functionality and automatically generates efficient register-transfer level (RTL) descriptions. In HLS, dataflow is t...
详细信息
ISBN:
(数字)9798350352030
ISBN:
(纸本)9798350352047
High-level synthesis (HLS) is a popular method that allows designers to describe the behavior-level functionality and automatically generates efficient register-transfer level (RTL) descriptions. In HLS, dataflow is the key micro-architecture to achieve high parallelism. However, strict conditions such as sequential access on the potential channels often limit the streaming dataflow. To settle this issue, this paper proposes an efficient array partitioning method for the streaming dataflow inference. The key is to explore the potential array partitioning mode that matches the sequential access requirements by streaming channels. An experimental case study is presented on the inference of the convolutional neural networks (CNN). It indicates that the proposed method can achieve about 28.6% performance improvements compared with the default dataflow, with the cost of 7.2% power increasement.
In VLSI design, logic synthesis (LS) converts a high-level description of a circuit to a gate-level netlist, generally using a unified heuristic algorithm to optimize different combi-national circuits. LS relies on a ...
详细信息
ISBN:
(数字)9798350352030
ISBN:
(纸本)9798350352047
In VLSI design, logic synthesis (LS) converts a high-level description of a circuit to a gate-level netlist, generally using a unified heuristic algorithm to optimize different combi-national circuits. LS relies on a series of optimization commands to conduct the optimization, but the complexity of synthesis optimization flow increases exponentially with more commands used. Methods based on machine learning are widely used in LS. In particular, reinforcement learning (RL) is a very efficient method for exploring the customized circuit design space. For rapid LS, we propose an evolutionarily scheduled reinforcement Learning (ERL) framework, which is compatible with various agents adopted in prior RL-based LS works. Owing to the parallel execution on a multi-core processor, it can significantly improve exploration efficiency without losing solution quality. Our experiments show that, on EPFL benchmark and executing with 4 cores, our framework with RL agent in DRiLLS generally achieves 3.37 times speed-up to reach global optimal compared to the corresponding work. Our code is available at: ***/Intelligent-Computing- Research-GroupIERL- LS.
暂无评论