In recent years, multi-core digital signal processors (DSPs) have been widely used to improve execution efficiency in a variety of applications. In order to fully explore the parallel processing capacity of DSPs, a we...
详细信息
ISBN:
(纸本)9781479938445
In recent years, multi-core digital signal processors (DSPs) have been widely used to improve execution efficiency in a variety of applications. In order to fully explore the parallel processing capacity of DSPs, a well-designed parallelprogramming model is essential for programmers. In this paper, a parallelprogramming model for a self-designed multi-core audio DSP (MAD) is proposed based on both shared-memory and message-passing communication mechanisms. A set of application program interfaces (APIs) of PPMA are provided to realize inter-core data transmission and synchronization controlling with high efficiency. To evaluate performance improvement of audio applications using PPMA, a low bit-rate speech codec application is ported to the MAD. With the help of PPMA, task scheduling of speech codec can be implemented conveniently. Experimental results also show that the overhead of inter-core communication in MAD is negligible compared to the parallel speedup achieved by PPMA.
Three dimensional wave propagation model of parabolic approximation type is widely used in exploring the ocean. For this application, FOR3D model is one of the mostly used models, for it takes the azimuthal coupling i...
详细信息
ISBN:
(纸本)9781479938445
Three dimensional wave propagation model of parabolic approximation type is widely used in exploring the ocean. For this application, FOR3D model is one of the mostly used models, for it takes the azimuthal coupling into consideration and the result gains a better precision. When the precision requirement is high, large scale computation task would be faced, which cannot be solved only in one computer or constrained time. In this paper, we propose a parallel method to decompose the computation task, which divide the original computation into small size. Then each processor get one piece of task and run the FOR3D independently. Our method was implemented on Windows Azure and the result shows that we almost gain a linear speed-up.
Summary form only given. Since the before birth of computers we have strived to make intelligent machines that share some of the properties of our own brains. We have tried to make devices that quickly solve problems ...
详细信息
ISBN:
(纸本)9781509004621
Summary form only given. Since the before birth of computers we have strived to make intelligent machines that share some of the properties of our own brains. We have tried to make devices that quickly solve problems that we find time consuming, that adapt to our needs, and that learn and derive new information. In more recent years we have tried to add new capabilities to our devices: self-adaptation, fault tolerance, self-repair, even self-programming, or self-building. In pursing these challenging goals we push the boundaries of computer and software architectures. We invent new parallel processing approaches or we exploit hardware in new ways. For the last decade Peter Bentley and his group have made their own journey in this area. In order to overcome the observed incompatibilities between conventional architectures and biological processes, Bentley created the Systemic Computer [1] -- a computing paradigm and architecture designed to process information in a way more similar to natural systems. The computer uses a systemic world-view. Instead of the traditional centralized view of computation, here all computation is distributed. There is no separation of data and code/functionality into memory, ALU, and I/O. Everything in systemic computation is composed of systems, which may not be destroyed, but may transform each other through their interactions, akin to collision-based computing. Two systems interact in the context of a third system, which defines the result of their interaction. All interactions may be separated and embedded within scopes, which are also systems, enabling embedded hierarchies. Systemic computation makes the following assertions: · Everything is a system. · Systems can be transformed but never destroyed or created from nothing. · Systems may comprise or share other nested systems. · Systems interact, and interaction between systems may cause transformation of those systems, where the nature of that transformation is determined by a contex
Network-on-Chip (NoC) is an interesting communication fabric for multi processing element architectures that benefits from the parallelism of algorithms. We present a method that uses a symbolic execution technique to...
详细信息
ISBN:
(纸本)9781467382779
Network-on-Chip (NoC) is an interesting communication fabric for multi processing element architectures that benefits from the parallelism of algorithms. We present a method that uses a symbolic execution technique to extract the parallelism of an application to be mapped on FPGAs using the flexibility of a NoC communication infrastructure and the properties of a high level programming language. An application specific hardware is then generated using a High Level Synthesis flow. We provide a dedicated mechanism for data paths reconfiguration that allows different applications to run on the same set of processing elements. Thus, the output design is programmable and has a processor-less distributed control. This approach of using NoCs enables us to automatically design generic architectures that can be used on FPGA servers for High Performance Reconfigurable Computing. We validate our method on binomial tree applications used for option pricing on FPGAs.
An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, space-efficient algorithms to process the resultant data. The first step in processing is alignment of many short...
详细信息
ISBN:
(纸本)9781479959198
An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, space-efficient algorithms to process the resultant data. The first step in processing is alignment of many short DNA sequences, or reads, against a large reference sequence. This work presents WOODSTOCC, an implementation of short-read alignment designed for Graphics Processing Unit (GPU) architectures. WOODSTOCC translates a novel CPU implementation of gapped short-read alignment, which has guaranteed optimal and complete results, to the GPU. Our implementation combines an irregular trie search with dynamic programming to expose regularly structured parallelism. We first describe this implementation, then discuss its port to the GPU. WOODSTOCC's GPU port exploits three generally useful techniques for extracting regular parallelism from irregular computations: dynamic thread mapping with a worklist, kernel stage decoupling, and kernel slicing. We discuss the performance impact of these techniques and suggest further opportunities for improvement.
Distributed applications often require high-performance networks with strict connectivity guarantees. For instance, many cloud applications suffer from today's variations of the intra-cloud bandwidth, which leads ...
详细信息
ISBN:
(纸本)9780769552071
Distributed applications often require high-performance networks with strict connectivity guarantees. For instance, many cloud applications suffer from today's variations of the intra-cloud bandwidth, which leads to poor and unpredictable application performance. Accordingly, we witness a trend towards virtual networks (VNets) which can provide resource isolation. Interestingly, while the problem of where to embed a VNet is fairly well-understood today, much less is known about when to optimally allocate a VNet. This however is important, as the requirements specified for a VNet do not have to be static, but can vary over time and even include certain temporal flexibilities. This paper initiates the study of the temporal VNet embedding problem (TVNEP). We propose a continuous-time mathematical programming approach to solve the TVNEP, and present and compare different algorithms. Based on these insights, we present the c Sigma-Model which incorporates both symmetry and state-space reductions to significantly speed up the process of computing exact solutions to the TVNEP. Based on the c Sigma-Model, we derive a greedy algorithm c Sigma(G)(A) to compute fast approximate solutions. In an extensive computational evaluation, we show that despite the hardness of the TVNEP, the c Sigma-Model is sufficiently powerful to solve moderately sized instances to optimality within one hour, and under different objective functions (such as maximizing the number of embeddable VNets). We also show that the greedy algorithm exploits flexibilities well and yields good solutions. More generally, our results suggest that already little time flexibilities can improve the overall system performance significantly.
The biomedical imagery, the numeric communications, the acoustic signal processing and many others digital signal processing (DSP) applications are present more and more in the numeric world. They process growing data...
详细信息
ISBN:
(纸本)9781479961238
The biomedical imagery, the numeric communications, the acoustic signal processing and many others digital signal processing (DSP) applications are present more and more in the numeric world. They process growing data volume which is represented with more and more accuracy, and use complex algorithms with time constraints to satisfying. Consequently, a high requirement of computing power characterize them. To satisfy this need, it's inevitable today to use parallel and heterogeneous architectures in order to speedup the processing, where the best examples are today's supercomputers like "Tianhe-2" and "Titan" of Top500 ranking. These architectures with their multi-core nodes supported by many-core accelerators offer a good response to this problem. However, they are still hard to program to make performance because of many reasons: parallelism expression, task synchronization, memory management, hardware specifications handling, load balancing ... In the present work, we are characterizing DSP applications and propose a programming model based on their distinctiveness in order to implement them easily and efficiently on heterogeneous clusters.
In this paper, we present a new approach towards programming coarse-grained reconfigurable arrays (CGRAs) in an intuitive, dataflow inspired way. Based on the observation that available CGRAs are usually programmed us...
详细信息
Based on GPU parallel technology, this paper proposes a parallel SRM feature extraction algorithm to accelerate the extraction of SRM feature for steganalysis of HUGO images. Using the parallel program framework of Op...
详细信息
Based on GPU parallel technology, this paper proposes a parallel SRM feature extraction algorithm to accelerate the extraction of SRM feature for steganalysis of HUGO images. Using the parallel program framework of OpenCL for GPU, we parallelize and implement a serial algorithm and employ some optimization technologies for our parallel program to accelerate the extraction process. The techniques include convolution unrolling, combined memory access, aversion of bank conflicts. The experimental results show that the speed of the proposed parallel extraction algorithm for different size images is 25~55 times faster than the original serial algorithm, and 2~4.2 times faster than running the parallel method on Quad-core CPU.
We present new parallelalgorithms for testing pattern involvement for all length 4 permutations. Our algorithmshave the complexity of O(log n) time with n/log nprocessors on the CREW PRAM model, O(logloglog n) timewi...
详细信息
We present new parallelalgorithms for testing pattern involvement for all length 4 permutations. Our algorithmshave the complexity of O(log n) time with n/log nprocessors on the CREW PRAM model, O(logloglog n) timewith n/logloglog n processors or constant time and nlog3 nprocessors on a CRCW PRAM model. parallelalgorithms werenot designed before for some of these patterns and for otherpatters the previous best algorithms require O(log n) time and n processors on the CREW PRAM model.
暂无评论