Following the re-invention of the FFT algorithm by Cooley and Tukey in 1965, a lot of effort has been invested into optimization of this algorithm and all its variations. In this paper, we discuss its use and optimiza...
详细信息
ISBN:
(纸本)9781728199245
Following the re-invention of the FFT algorithm by Cooley and Tukey in 1965, a lot of effort has been invested into optimization of this algorithm and all its variations. In this paper, we discuss its use and optimization for current and future radar applications, and give a brief survey on implementations that have claimed relatively high advantages in terms of performance over existing solutions. Correspondingly, we present an in-depth analysis of state-of-the-art solutions and our own implementation that will allow the reader to evaluate the performance improvements on a fair basis. therefore, we discuss the development of a high-performance Fast Fourier Transform (FFT) using an enhanced Radix-4 decimation in frequency (DIF) algorithm, compare it against the Fastest Fourier Transform in the West (FFTW) autotuned library as well as other solutions and frameworks.
Convolutional neural networks (CNN) is playing an important role in many fields. Many applications are able to run the inference process of CNN with pre-trained models on mobile devices in these days. Improving perfor...
详细信息
External sorting-the process of sorting a file that is too large to fit into the computer's internal memory and must be stored externally on disks-is a fundamental subroutine in database systems [G], [IBM]. Of pri...
详细信息
External sorting-the process of sorting a file that is too large to fit into the computer's internal memory and must be stored externally on disks-is a fundamental subroutine in database systems [G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. the simple randomized merging (SRM) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K, Section 5.4.9] recently identified SRM (which he calls "randomized striping") as the method of choice for sorting withparallel disks. In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is based on synchronous parallel I/O primitives provided by the TPIE programming environment [TPI], whenever our program issues an I/O read (write) operation, one block of data is synchronously read from (written to) each disk in parallel. We compare the performance of SRM over a wide range of input sizes withthat of disk-striped mergesort (DSM), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk access. SRM merges together significantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often significantly faster than with DSM
In this paper, we propose a new family of interconnection networks, called cyclic networks (CNs), in which an intercluster connection is defined on a set of nodes whose addresses are cyclic shifts of one another. the ...
详细信息
In this paper, we propose a new family of interconnection networks, called cyclic networks (CNs), in which an intercluster connection is defined on a set of nodes whose addresses are cyclic shifts of one another. the node degrees of basic CNs are independent of system size, but can vary from a small constant (e.g., 3) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. the diameters of suitably constructed CNs can be asymptotically optimal within their lower bounds, given the degrees. We show that packet routing and ascend/descend algorithms can be performed in /spl theta/(log/sub d/ N) communication steps on some CNs with N nodes of degree /spl theta/(d). Moreover CNs can also efficiently emulate homogeneous product networks (e.g., hypercubes and high dimensional meshes). As a consequence, we obtain a variety of efficient algorithms on such networks, thus proving the versatility of CNs.
Clustering is one of the important tasks of machine learning. Gene Expression programming (GEP) is used to solve clustering problems because of its strong global searching ability. In order to solve the limitation of ...
详细信息
暂无评论