With the rapid growth of artificial intelligence (AI), the Internet of Things (IoT) and big data, emerging applications that cross stacks with different techniques bring new challenges to parallelcomputing systems. T...
详细信息
With the rapid growth of artificial intelligence (AI), the Internet of Things (IoT) and big data, emerging applications that cross stacks with different techniques bring new challenges to parallelcomputing systems. These cross-stack functionalities require one system to possess multiple characteristics, such as the ability to process data under high throughput and low latency, the ability to carry out iterative and incremental computation, transparent fault tolerance, and the ability to perform heterogeneous tasks that evolve dynamically. However, high-performance computing (HPC) and big data computing, as two categories of parallel computing architecture, are incapable of meeting all these requirements. Therefore, by performing a comparative analysis of HPC and big data computing from the perspective of the parallel programming model layer, middleware layer, and infrastructure layer, we explore the design principles of the two architectures and discuss a converged architecture to address the abovementioned challenges.
This paper proposes an unconventional architecture and algorithm for implementing reservoir computing on FPGA. An architecture-oriented algorithm with improved throughput and architecture designed to reduce memory and...
详细信息
This paper proposes an unconventional architecture and algorithm for implementing reservoir computing on FPGA. An architecture-oriented algorithm with improved throughput and architecture designed to reduce memory and hardware resource requirements are presented. The proposed architecture exhibits good performance in terms of benchmarks for reservoir computing. A prediction accelerator for reservoir computing that operates on 55.45 mW at 450 K fps with <3000 LEs is realized by implementing the architecture on FPGA. The proposed approach presents a novel FPGA implementation of reservoir computing focussing on both algorithms and architecture that may serve as a basis for applications of AI at network edge. [GRAPHICS] .
2D clustering aims at solving problems concerning bi-dimensional datasets in several application fields, such as medical imaging, image retrieval, computer vision and so on. A novel approach for 2D hierarchical fuzzy ...
详细信息
2D clustering aims at solving problems concerning bi-dimensional datasets in several application fields, such as medical imaging, image retrieval, computer vision and so on. A novel approach for 2D hierarchical fuzzy clustering is proposed, which relies on the use of kernel-based membership functions. This new metric allows to obtain unconstrained structures for data modelling. The performed tests show that the proposed approach can overcome well-known hierarchical clustering algorithms against different benchmarks, also having the chance to be deployed on parallel computing architectures.
In the recent years, computing is shifting from 'central procebing' on the CPU to 'co-procebing' on the CPU and GPU. This computing paradigm shift is due to the development of CUDA (Compute Unified Dev...
详细信息
The study describes the results of research carried out into the design of a parallel and resource-efficient solution to the real-data polyphase discrete Fourier transform (DFT), or PDFT. The solution is able to explo...
详细信息
The study describes the results of research carried out into the design of a parallel and resource-efficient solution to the real-data polyphase discrete Fourier transform (DFT), or PDFT. The solution is able to exploit both the real-valued nature of the data and the parallel processing capabilities of the computing technology - assumed to be a field-programmable gate array - to yield a solution with a low size, weight and power requirement. A parallel computing architecture has been devised, based upon batch processing, whereby pipelined operation of the polyphase filter bank (PFB) is achieved using shared resources and pipelined operation of the real-data DFT using the resource-efficient regularised fast Hartley transform (RFHT). The PFB outputs are appropriately re-ordered for input to the RFHT by means of a suitably defined finite state machine. The resulting design, which includes a flexible up-sampling capability (with rational over-sampling factor) to address the problem of adjacent channel interference, trade-off time complexity against space complexity in order to satisfy the associated timing constraints. The solution is also scalable, in terms of the number of channels, so that it might be easily adapted, for new or multiple applications, at minimal re-design effort and cost.
To be realistic, an urban model must include appropriate numbers of pedestrians, vehicles, and other dynamic entities. Using a parallel-computingarchitecture, researchers simulated a marathon with more than a million...
详细信息
To be realistic, an urban model must include appropriate numbers of pedestrians, vehicles, and other dynamic entities. Using a parallel-computingarchitecture, researchers simulated a marathon with more than a million participants. To simulate participant behavior, they used fuzzy logic on a GPU to perform millions of inferences in real time.
暂无评论