Calculation of mean, variance and standard deviation are often required for segmentation or feature extraction. In image processing, often an integer approximation is adequate. Conventional methods require division an...
详细信息
ISBN:
(纸本)9781479908837;9781479908820
Calculation of mean, variance and standard deviation are often required for segmentation or feature extraction. In image processing, often an integer approximation is adequate. Conventional methods require division and square root operations, which are expensive to realize in hardware in terms of boththe amount of required resources and latency. A new class of iterative algorithms is developed based on integer arithmetic. An implementation of the algorithms as a hardware architecture for a Field-Programmable Gate Array (FPGA) is compared witharchitectures using the conventional approach, which shows a significantly reduced latency while using less hardware resources.
In bioinformatics, one of the gold-standard algorithms to compute the optimal similarity score between sequences in a sequence database searches is Smith-Waterman algorithm that uses dynamic programming. this algorith...
详细信息
ISBN:
(纸本)9789791421195
In bioinformatics, one of the gold-standard algorithms to compute the optimal similarity score between sequences in a sequence database searches is Smith-Waterman algorithm that uses dynamic programming. this algorithm has a quadratic time complexity which requires a long computation time for large-sized data. In this issue, parallel computing is essential for sequence database searches in order to reduce the running time and to increase the performance. In this paper, we discuss the parallel implementation of Smith-Waterman algorithm in GPU using CUDA C programming language with NVCC compiler on Linux environment. Furthermore, we run the performance analysis using three parallelization models, including Inter-task parallelization, Intra-task parallelization, and a combination of both models. Based on the simulation results, a combination of both models has better performance than the others. In addition the parallelization using combination of both models achieves an average speed-up of 313X and an average efficiency with a factor of 0.93.
cDNA microarrays are a useful tool for studying the expression levels of genes. Nevertheless, microarray image gridding remains a challenging and complex task. Most of the microarray image analysis tools require human...
详细信息
ISBN:
(纸本)9781479931637
cDNA microarrays are a useful tool for studying the expression levels of genes. Nevertheless, microarray image gridding remains a challenging and complex task. Most of the microarray image analysis tools require human intervention, leading to variations of the gene expression results. Automatic methods have also been proposed, but present high computational complexity. In this work, the performance enhancement via GPU computing techniques of a fully automatic gridding method, previously proposed by the authors' research group, is presented. the NVIDIA CUDA architecture was utilized in order to achieve parallel computation of complex steps of the algorithm. Experimental results showed that the proposed approach provides enhanced performance in terms of computational time, while achieving higher utilization of the available computational resources.
While the traditional single-processor system can not meet the requirements of modern array signal system, a new high-speed parallel system based on DSPs and FPGA is designed. the system complies with VITA 46 standard...
详细信息
In the aircraft industry structural components, referred to as part numbers (PN), have to be subject to an heat treatment in capacitated burn-in furnaces for a pre-defined period (exposure time) in order to provide th...
详细信息
ISBN:
(纸本)9781467358149;9781467358125
In the aircraft industry structural components, referred to as part numbers (PN), have to be subject to an heat treatment in capacitated burn-in furnaces for a pre-defined period (exposure time) in order to provide them with specific physic and chemical features (e.g. hardness, corrosion resistance, conductivity). Two or more part numbers can be grouped in a batch and treated simultaneously in the same furnace if it is possible to individuate a common exposure time. In order to minimize the total completion time (makespan) of the process it needs to determine the appropriate grouping of the part numbers into batches (batching problem) to be processed by each furnace (scheduling problem). the problem can be modeled as a batch scheduling problem on parallel machines where the batching and the scheduling problem are considered at the same time. Starting from a real case study, we present an original integer linear programming formulation in the case of two capacitated parallel machines and we provide the results obtained on two real instances coming from the aircraft industry.
Stochastic encoding represents a value using the probability of ones in a random bit stream. Computation based on this encoding has good fault-tolerance and low hardware cost. However, one of its major issues is long ...
详细信息
ISBN:
(纸本)9781479904945;9781479904938
Stochastic encoding represents a value using the probability of ones in a random bit stream. Computation based on this encoding has good fault-tolerance and low hardware cost. However, one of its major issues is long processing time. We have to use a long enough bit stream to represent a value to guarantee that random fluctuations introduce only small errors to final computation results. For example, for most digital image processingalgorithms, we need a 512-bit stream to represent an 8-bit pixel value stochastically to guarantee that the final computation error is less than 5%. To solve this issue, this paper proposes to share bits between adjacent bit streams to represent adjacent deterministic values. For example, in image processing applications, the bit stream which represents the current pixel value can share parts of the bits in the bit stream which represents the previous pixel value. We use an image contrast stretching algorithm to evaluate this method. Our experimental results show that the proposed methods can improve the performance by 90%.
Several highly optimized implementations of Finite Difference schemes are discussed. the combination of vectorization and an interleaved data layout, spatial and temporal loop tiling algorithms, loop unrolling, and pa...
详细信息
ISBN:
(纸本)9783642368035
Several highly optimized implementations of Finite Difference schemes are discussed. the combination of vectorization and an interleaved data layout, spatial and temporal loop tiling algorithms, loop unrolling, and parameter tuning lead to efficient computational kernels in one to three spatial dimensions, truncation errors of order two to twelve, and isotropic and compact anisotropic stencils. the kernels are implemented on and tuned for several processor architectures like recent Intel Sandy Bridge, Ivy Bridge and AMD Bulldozer CPU cores, all with AVX vector instructions as well as Nvidia Kepler and Fermi and AMD Southern and Northern Islands GPU architectures, as well as some older architectures for comparison. the kernels are either based on a cache aware spatial loop or on time-slicing to compute several time steps at once. Furthermore, vector components can either be independent, grouped in short vectors of SSE, AVX or GPU warp size or in larger virtual vectors with explicit synchronization. the optimal choice of the algorithm and its parameters depend both on the Finite Difference stencil and on the processor architecture.
Recent FPGA technology advances permitted the hardware implementation of selected software functions to enhance programs performance. Most of the work done was only concerned with integer operations. Little effort add...
详细信息
ISBN:
(纸本)9781467358255
Recent FPGA technology advances permitted the hardware implementation of selected software functions to enhance programs performance. Most of the work done was only concerned with integer operations. Little effort addressed floating point operations. In this paper we propose a dataflow implementation of the LU decomposition on FPGA.A modified Kernighan-Lin based task partitioning and assignment algorithm is presented in this paper. the algorithm showed acceptable improvement over existing techniques.
In this paper, we address the problem of defining a semantic indexing techniques based on RDF triples. In particular, we define algorithms for: i) defining clustering techniques of semantically similar RDF triplets;ii...
详细信息
the trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallelalgorithms to a computing platform t...
详细信息
ISBN:
(纸本)9783642415326;9783642415333
the trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallelalgorithms to a computing platform that consists of multiple parallelprocessing nodes. In general different alternative mappings can be defined that perform differently with respect to the quality requirements for power consumption, efficiency and memory usage. the mapping process can be carried out manually for platforms with a limited number of processing nodes. However, for exascale computing in which hundreds of thousands of processing nodes are applied, the mapping process soon becomes intractable. To assist the parallel computing engineer we provide a model-driven approach to analyze, model, and select feasible mappings. We describe the developed toolset that implements the corresponding approach together withthe required metamodels and model transformations. We illustrate our approach for the well-known complete exchange algorithm in parallel computing.
暂无评论