DPCM (Differential Pulse Code Modulation) coding is widely used in many applications including lossless JPEG compression. DPCM decoding is inherently a 1-indexed or 2-indexed recurrence relation. thus, although it is ...
详细信息
ISBN:
(纸本)9780769533025
DPCM (Differential Pulse Code Modulation) coding is widely used in many applications including lossless JPEG compression. DPCM decoding is inherently a 1-indexed or 2-indexed recurrence relation. thus, although it is hard to parallelize efficiently, some (N log N)or (log(2) N) algorithms have been studied for an N x N image with N x N or N processors. Recently commodity microprocessors are equipped with plural cores and SMP architectures are utilized in some PCs, but the number of parallelism is not so large (up to 80). thus, it is unrealistic that the image processing of an N x N image is parallelized with N x N or N processors. In this paper we implements two parallel DPCM algorithms for an N x N image on P processors (P << N): Fat-pipeline and P-scheme. Our experimental results show that both approaches provide the parallelisms of about 3.2 with 6 processing cores.
this paper proposes a software based parallel CRC (Cyclic Redundancy Check) algorithm called 'N-byte RCC (Repetition of Computation and Combination)'. this algorithm is the iterative process of message computa...
详细信息
ISBN:
(纸本)9781424414567
this paper proposes a software based parallel CRC (Cyclic Redundancy Check) algorithm called 'N-byte RCC (Repetition of Computation and Combination)'. this algorithm is the iterative process of message computation by the 'slicing-by-4' and combination through the 'zero block lookup tables'. this algorithm can parallelize the CRC calculation with any number of processors. In order to verify the performance of our algorithm, we employ two different communication architectures;the single bus architecture and the 1-star topology NoC (Network on Chip) architecture. With respect to those architectures, we explore our parallel algorithm by using TLM (Transaction Level Model). From the simulation results, we present that the proposed parallel CRC algorithm with BUS and NoC architectures reduces the processing time by 28 percent and 38 percent, respectively, compared to the 'slicing-by-8' which is the fastest algorithms among other software based algorithms. Furthermore, the 1-star NoC architecture of the parallel CRC shows higher performance than the single bus architecture regardless of the number of processors.
As new emerging multimedia applications demand constant bit rate improvements, it is becoming clear that H.264/A VC technology will not he able to meet these demands in spite of the 40-50% gain in bitrate over H.26X. ...
详细信息
ISBN:
(纸本)9781424423576
As new emerging multimedia applications demand constant bit rate improvements, it is becoming clear that H.264/A VC technology will not he able to meet these demands in spite of the 40-50% gain in bitrate over H.26X. Recently, a novel video coding scheme based on the generalized finite automata (GFA) modeling of video sequences in the bitplane wavelet domain has been proposed to address this problem. Unfortunately, this scheme requires a computing workload that is difficult to support with software implementations capable of meeting the performance requirements of target applications. this paper applies transformation techniques on the GFA algorithm to map it to high performance architectures. these techniques are used to derive and implement an optimal 2D architecture based on specific performance parameters. Implementation experiments show that a single row of this architecture can match 1, 536 to 11, 627,906 quadrants per second depending on the size of the matched quadrant.
Due to their increasing computational power, modern graphics processingarchitectures are becoming more and more popular for general purpose applications with high performance demands. this is the case of quantum comp...
详细信息
ISBN:
(纸本)9783540693833
Due to their increasing computational power, modern graphics processingarchitectures are becoming more and more popular for general purpose applications with high performance demands. this is the case of quantum computer simulation, a problem with high computational requirements both in memory and processing power. When dealing with such simulations, multiprocessor architectures are an almost obliged tool. In this paper we explore the use of the new graphics processor architecture NVIDIA CUDA in the simulation of some basic quantum computing operations. this new architecture is oriented towards a more general exploitation of the graphics platform, allowing to use it as a parallel SIMD multiprocessor. In this direction, some implementation strategies are proposed, showing that the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy.
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is...
详细信息
ISBN:
(纸本)9780769533520
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is difficult to meet withthe increasing high performance requirements of diversified applications at different levels for general-purpose computing. A promising feasible solution is the novice multi-core systems which extend the parallelism to CPU level by integrating multiple processing units on a single die. this paper uses Finite-Difference Time-Domain (FDTD) algorithm as a case study, designing suitable parallel FDTD algorithms for three architectures: distributed-memory machines with single-core processors, shared-memory machines with dual-core processors, and the Cell Broadband Engine (Cell/B.E.) processor with nine heterogeneous cores. the experiment results show that the Cell/B.E. processor using 8 SPEs achieves a significant speedups of 7.05 faster than AMD single-core Opteron processor and 3.37 than AMD dual-core Opeteron processor at the Processor level.
this paper describes application of parallel Grammatical Evolution (PGE) algorithm to combinatorial logic circuit generation. the grammar and algorithms used are described. To increase the efficiency of Grammatical Ev...
详细信息
ISBN:
(纸本)9783540858560
this paper describes application of parallel Grammatical Evolution (PGE) algorithm to combinatorial logic circuit generation. the grammar and algorithms used are described. To increase the efficiency of Grammatical Evolution (GE) the backward processing algorithm was used. Different approaches to create multiobjective fitness functions are described and tested. Specifically the fitness functions are defined as set of rules incorporating different comparison methods in each stage of the computation. the algorithm is internally parallel and consists of three different interconnected populations.
A novel fast scheme for Discrete Wavelet Transform (DWT) was lately introduced under the name of lifting scheme [4, 10]. this new scheme presents many advantages over the convolution-based approach [10, 11]. For insta...
详细信息
ISBN:
(纸本)9780769532875
A novel fast scheme for Discrete Wavelet Transform (DWT) was lately introduced under the name of lifting scheme [4, 10]. this new scheme presents many advantages over the convolution-based approach [10, 11]. For instance it is very suitable for parallelization. In this paper we present two new FPGA-based parallel implementations of the DWT lifting-based scheme. the first implementation uses pipelining, parallelprocessing and data reuse to increase the speed up of the algorithm. In the second architecture a controller is introduced to deploy dynamically a suitable number of clones accordingly to the available hardware resources on a targeted environment. these two architectures are able of processing large size incoming images or multi framed images in real-time. the simulations driven on a Xilinx Virtex-5 FPGA environment has proven the practical efficiency of our contribution. In fact, the first architecture has given an operating frequency of 289 MHz, and the second architecture demonstrated the controller's capabilities of determining the true available resources needed for a successful deployment of independent clones, over a targeted FPGA environment and processingthe task in parallel.
In the quest of designing extremely fault-tolerant computing systems drawing inspiration from nature is one avenue worth exploring. Embryonics (embryonic electronics) is a research project that attempts to implement f...
详细信息
ISBN:
(纸本)9783540858560
In the quest of designing extremely fault-tolerant computing systems drawing inspiration from nature is one avenue worth exploring. Embryonics (embryonic electronics) is a research project that attempts to implement features otherwise available in the world of biology to design robust, massively parallel arrays of processors. this paper elaborates on some of the design approaches undertaken in order to ensure a high level of fault-tolerance as well as on how to partition the array in order to optimally make use of spare resources.
the development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. the GraPA framework,, as an abstraction layer on top of hardware characteris...
详细信息
ISBN:
(纸本)9780769534435
the development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. the GraPA framework,, as an abstraction layer on top of hardware characteristics, supports modelers in two respects: one is the built-in support for co-processing of multiple models and the other is the generically delivered high performance achieved by implementing concurrency features of multicore and distributed memory architectures. Technically, GraPA is designed as a C++ template framework, where the modeler's data structures and algorithms instantiate the framework. Using this approach, we handle parallelprocessing of lock-free data structures and message passing transperently to the modelers. In this paper, we report on the status of the implementation of GraPA and on its performance characteristics.
Sequence alignment is one of the most important techniques in Bioinformatics. Although efficient dynamic programming algorithms exist for this problem, the alignment of very long DNA sequences still requires significa...
详细信息
ISBN:
(纸本)9783540681052
Sequence alignment is one of the most important techniques in Bioinformatics. Although efficient dynamic programming algorithms exist for this problem, the alignment of very long DNA sequences still requires significant time on traditional computer architectures. In this paper, we present a scalable and efficient mapping of DNA sequence alignment onto the Cell BE multi-core architecture. Our mapping uses two types of parallelization techniques: (i) SIMD vectorization within a processor and (ii) wavefront parallelization between processors.
暂无评论