Despite enduring criticisms spanning several decades, jump statements such as goto, break, continue, and return remain prevalent in imperative programming languages, including but not limited to C++, Java, and Python....
详细信息
ISBN:
(纸本)9798400708688
Despite enduring criticisms spanning several decades, jump statements such as goto, break, continue, and return remain prevalent in imperative programming languages, including but not limited to C++, Java, and Python. The academic community has yet to reach a consensus regarding whether the refactoring of source code in these languages to eliminate such statements can indeed enhance code readability. Nevertheless, it is evident that automated program analysis would derive substantial benefits from this refactoring, given that structured code analysis is more straightforward than analyzing code that exhibits capricious alterations in its control flow. While algorithms tailored for this refactoring process have been proposed for certain imperative languages, we introduce a congruent algorithm, specifically designed for a dataflow programming language. It’s important to note that although dataflow languages lack jump statements, they might incorporate jump-objects (in object-oriented contexts) or jump-functions (within functional paradigms). Our algorithm has been instantiated as a command-line tool tailored for refactoring EO, an object-oriented dataflow language. Preliminary tests with several EO programs have validated the tool’s efficacy. Leveraging φ -calculas, we provide a formal proof underscoring the validity of every transformation encompassed within our algorithm.
This work explores the placement and routing of machine learning applications' dataflow graphs on different heterogeneous coarse-grained reconfigurable architectures (CGRA). We analyze three different types of pro...
详细信息
This work explores the placement and routing of machine learning applications' dataflow graphs on different heterogeneous coarse-grained reconfigurable architectures (CGRA). We analyze three different types of processing element (PE) heterogeneity, the first concerning the interconnection pattern, the second being on the kind of operations a single PE can execute, and the last concerning the PE buffer resources. This analysis aim to propose a fair reduction to the overall cost in comparison to the homogeneous CGRA architecture. We compare our results with the homogeneous case and one of the state-of-the-art tools for placement and routing (P&R). Our algorithm executed, on average, 52% faster than VPR 8.1 (Versatile Place and Route), which is an open-source academic tool designed for the FPGA placement and routing phases, reaching better mapping in 66% of cases and achieving the same results in 26% of cases. Furthermore, a heterogeneous architecture reduces the cost without losing performance in 76% of the cases considering multiplier heterogeneity. We propose a novel heterogeneous buffer architecture that minimizes the buffer resources by 56.3% for K-means dataflow patterns. We also show that a heterogeneous border chess architecture outperforms a homogeneous one. In addition, our mapping reaches optimal instances of single tree dataflows compared to classical Lee/Choi and H-trees.
Computational finance is a challenging application domain with ever-increasing performance requirements. Driven by the competition between companies, computational finance pushes High Performance computing (HPC) techn...
详细信息
ISBN:
(纸本)9781538634097
Computational finance is a challenging application domain with ever-increasing performance requirements. Driven by the competition between companies, computational finance pushes High Performance computing (HPC) technology to its limits. In this paper, we consider Asian options which are financial derivatives whose payoff is determined by the average price of their underlying asset at predetermined observation points rather than on the single value at expiration time. Due to this path dependency, their pricing is computationally expensive and is therefore a suitable candidate for dataflow acceleration. This paper introduces an application for Asian option pricing based on Curran's approximation method that exploits a dataflow-oriented development approach, employing dedicated optimisations and replacing conventional floating-point with fixed-point formats wherever possible. The implementation targets a Maxeler server-class HPC system consisting of a CPU server node and Maxeler dataflow engines encapsulating Altera Stratix V FPGAs. The application has been evaluated on two different data sets and achieves a speed-up of 111x and 278.3x compared to a single-threaded software implementation, and 4x and 9.2x compared to a multi-threaded software implementation running on a dual socket CPU server with 12-core Intel Xeon E5-2697 v2 CPUs with up to 48 hyper-threads in total.
Cloud FPGAs provide new energy-efficient opportunities to design dataflow accelerators. Nevertheless, FPGAs still have challenges to overcome for widespread usages, such as programmability, compilation time (minutes t...
详细信息
Cloud FPGAs provide new energy-efficient opportunities to design dataflow accelerators. Nevertheless, FPGAs still have challenges to overcome for widespread usages, such as programmability, compilation time (minutes to hours), and hardware knowledge, mainly because it is highly challenging for beginners to learn and use FPGAs. The READY tool recently provides compilation time reduction to the range of microseconds using a CGRA overlay and a friendly, high-level C++ interface for the Intel/Altera HARPv2 FPGA cloud platform. However, the HARPv2 is not available in any commercial cloud platform. This work extends READY by creating the fast flow cloud framework (FFC). First, FFC offers a simple browser-based graphical interface for less experienced FPGA users. Second, we improve the CGRA overlay portability to include Xilinx FPGAs and a transparent design flow to deploy in the widespread commercial Amazon AWS F1 cloud. Third, we improve the CGRA reconfiguration engine. Also, we compare the overlay performance of HARPv2 and AWS F1 to an eight-thread XEON processor. Finally, the framework is open-source for collaborative development and has clearly defined application programming interfaces for future extensions.
dataflow programming has received increasing attention in the age of multicore computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimizat...
详细信息
ISBN:
(纸本)9781479975921
dataflow programming has received increasing attention in the age of multicore computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by the ISO. the RVC-CAL dataflow language provides a solid basis for the development of tools, design methodologies and design flows. This paper proposes a novel design flow for mapping RVC-CAL dataflow programs to highly parallel execution platforms. Through the proposed design flow the programmer can describe an application in the RVC-CAL language and map it to multi- and many-core platforms for efficient execution. The functionality and efficiency of the proposed approach is demonstrated by a parallel implementation of a video processing application and a run-time reconfigurable filter for telecommunications. Experiments are performed on a multicore platform with up to 16 cores, and the results show that for high-performance applications the proposed design flow provides up to 4x higher throughput than the state-of-the-art approach in multicore execution of RVC-CAL programs.
High speed computing and growing amounts of data are driving the quest for ever faster sorting algorithms. Sorting networks executing parallel sorting and dataflow computational paradigm are offered as a possible solu...
详细信息
ISBN:
(纸本)9781479914180
High speed computing and growing amounts of data are driving the quest for ever faster sorting algorithms. Sorting networks executing parallel sorting and dataflow computational paradigm are offered as a possible solution. In presented experiments Bitonic mergesort algorithm is implemented on an entry model of the Maxeler dataflow supercomputing system. Our results show, that sorting of a small size arrays on Maxeler, comparing to the fastest sorting algorithm on a CPU, achieves the speedup factor of 16. Using more advanced Maxeler systems, we expect to be able to sort larger arrays and achieve greater speedups.
暂无评论