the execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, cons...
详细信息
ISBN:
(纸本)9781467351652;9780769549149
the execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, consisting of programming and execution models, withthe objective appropriately exploiting grid computing characteristics. this paper proposes a parallelprocessing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a CPAR parallel language programming model. the environment is designed to execute parallel applications in grid computing, where all the characteristics present in grid computing are transparent to users. the results show that this environment is an efficient solution for the execution of parallel applications.
this paper presents a parallel architecture for analog-to-information converters. the converter's architecture shifts most of the computational burden to the digital domain to take full advantage of deep-submicron...
详细信息
ISBN:
(纸本)9781467308595
this paper presents a parallel architecture for analog-to-information converters. the converter's architecture shifts most of the computational burden to the digital domain to take full advantage of deep-submicron IC fabrication technologies. the analog part of the converter is composed of a single multibit first-order Delta-Sigma modulator. the proposed architecture is validated through numerical simulations. the results show that signal recovery improves significantly when a 2-bit and a 3-bit Delta-Sigma modulators are employed.
Computer vision is a field of computer science which recently received increasingly to the fore. To accelerate computing in image processing can be used specialized processors that work on the principle of accelerator...
详细信息
ISBN:
(纸本)9781467351225;9781467351201
Computer vision is a field of computer science which recently received increasingly to the fore. To accelerate computing in image processing can be used specialized processors that work on the principle of accelerators. Designed arithmetic-logic unit is a processor module, which executes image processing based on the selected instruction. the parallel design of arithmetic-logic unit can also accelerate image processing.
An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can ...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can provide powerful computing platform for such non-numeric computations if an efficient parallel priority queue implementation is available. In this paper, aiming at fine-grained applications, we develop an efficient parallel heap system employing CUDA. To our knowledge, this is the first parallel priority queue implementation on many-core architectures, thus represents a breakthrough. By allowing wide heap nodes to enable thousands of simultaneous deletions of highest priority items and insertions of new items, and taking full advantage of CUDA's data parallel SIMT architecture, we demonstrate up to 30-fold absolute speedup for relatively fine-grained compute loads compared to optimized sequential priority queue implementation on fast multicores. Compared to this, our optimized multicore parallelization of parallel heap yields only 2-3 fold speedup for such fine-grained loads. this parallelization of a tree-based data structure on GPGPUs provides a roadmap for future parallelizations of other such data structures.
the "one-architecture-fits-all" design philosophy is inadequate for catering to the diverse characteristics of applications running on manyCore architectures. After evaluating various configurations of manyC...
详细信息
Next Generation Sequencing (NGS) platforms typically produce short reads of size 50-150 base pairs (bp). the number of such short reads can be up to 6 billion per run. To align these short reads to a large genome is a...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
Next Generation Sequencing (NGS) platforms typically produce short reads of size 50-150 base pairs (bp). the number of such short reads can be up to 6 billion per run. To align these short reads to a large genome is a computationally challenging problem. In this paper, we address this problem by considering the design and optimization of parallel sequence alignment on GPU based hybrid architectures. Even though the sequence alignment algorithm is inherently data-parallel, issues such as (a) space-time trade-offs in the Indexing schema, (b) need for fast candidate location search (CAL) on GPU, (c) maintaining low divergence along with low space for the dynamic programming based local alignment, make this a very challenging problem. We present the design of our novel parallel algorithm Graphics processor Accelerated BFAST (GrABFAST) for large scale read alignment that overcomes these challenges and demonstrates superior performance compared to Intel multicore architectures. Using 5 large genomes including those of Humans, Maize, Horse, Dog and Bacteria, we demonstrate a speedup of around 6x using Fermi Tesla C2070 GPUs vs the BFAST algorithm on 16 core Intel Xeon 5570 architecture.
LDPC codes have been intensively used in various wireless communication applications, due to their increased BER performance. the present paper summarizes the state of the art applications of short length LDPC codes a...
详细信息
ISBN:
(纸本)9781479914920
LDPC codes have been intensively used in various wireless communication applications, due to their increased BER performance. the present paper summarizes the state of the art applications of short length LDPC codes and proposes FPGA based application specific hardware architectures for short-length LDPC decoders. the decoding algorithms considered for implementation are both belief propagation and min-sum algorithm. Due to the increased BER performances, the proposed architecture make use of parallel computation capabilities offered by FPGA technology in order to implement the belief propagation algorithm. In spite of the iterative nature and increased computational complexity of the LDPC decoding algorithm, the proposed architecture achieves high-throughput, mandatory in real-time application and data transmission. the architecture for the LDPC belief propagation based decoder is based on arctangent hyperbolic function approximation used for check nodes update.
Real-valued black-box optimization of badly behaved and not well understood functions is a wide topic in many scientific areas. Possible applications range from maximizing portfolio profits in financial mathematics ov...
详细信息
Sequence alignment has been widely utilized in biological computing science. To obtain the optimal alignment results many algorithms adopts dynamic programming method to achieve this goal. Smith-Waterman algorithm is ...
详细信息
ISBN:
(纸本)9781479909735
Sequence alignment has been widely utilized in biological computing science. To obtain the optimal alignment results many algorithms adopts dynamic programming method to achieve this goal. Smith-Waterman algorithm is the famous in the sequence alignment approach. However, such dynamic programming algorithms are computation-consuming. It is impossible to use these algorithms to compare query sequence with a sequence database such as GenBank and PDB. Recently, GPU computing has been applied in many sequence alignment algorithms to enhance the performance. In this paper, we proposed a GPU-based Smith-Waterman algorithm by combining the CPU and GPU computing capabilities to accelerate alignments on a sequence database. In the proposed algorithm, a filtration mechanism using frequency distance is used to decrease the number of compared sequences. We implemented the Smith-Waterman alignments by CUDA on the NVIDIA Tesla C2050. the experimental results show that the highest speedup ratio is about 80 to 90 times over CPU-based Smith-Waterman algorithm.
暂无评论