OpenMP is a widely used parallel programming model on traditional multi-core processors. Generally, OpenMP is used to develop fine-grained parallelism through a multithread model. Stream programming model is a new kin...
详细信息
OpenMP is a widely used parallel programming model on traditional multi-core processors. Generally, OpenMP is used to develop fine-grained parallelism through a multithread model. Stream programming model is a new kind of parallel programming model for stream architectures. OpenMP bears a resemblance to the stream programming model at some level. the transformation between the two models has attracted much attention from the research community, since it is the foundation of porting programs between the two architectures. Most related researches focus on the efficiency of porting existing parallel programs to the new architectures such as GPUs. Very few of these studies, however, focus on the portative problem systematically, namely, what kind of parallel programs can be or should be transplanted into stream programs and mapped to run on the stream processors. In this paper, we study the mapping relationship of parallel mechanism in OpenMP to the stream programming model, and point out those parallel mechanisms in OpenMP that are infeasible or undesirable for stream programs. By analyzing two typical benchmarks, we draw the conclusion that a majority of scientific applications are suitable to be mapped to the stream programming model. Our conclusion effectively validates the idea of accelerating scientific applications withthe stream processors.
Gradual patterns highlight complex order correlations of the form "the more/less X. the more/less Y". Only recently algorithms have appeared to mine efficiently gradual rules. However, due to the complexity ...
详细信息
ISBN:
(纸本)9783642120251
Gradual patterns highlight complex order correlations of the form "the more/less X. the more/less Y". Only recently algorithms have appeared to mine efficiently gradual rules. However, due to the complexity of milling gradual rules, these algorithms cannot yet scale on huge real world datasets. hi this paper, we propose to exploit parallelism in order to enhance the performances of the fastest existing one (GRITE). through a detailed experimental study, we show that our parallel algorithm scales very well withthe number of cores available.
Large-scale scientific simulations routinely produce data of increasing resolution. Analyzing this data is key to scientific discovery. A critical bottleneck facing data analysis is the I/O time to access the data due...
详细信息
In this paper we develop the parallel numerical algorithm for modelling of electromagnetic properties of thin conductive layers. the explicit finite difference scheme is obtained after approximation of the system of d...
详细信息
We investigate the impact of Irrecoverable Read Errors (IREs) on Mean Time To Data Loss (MTTDL) of declustered-parity RAID 6 systems. By extending the analytic model to study the reliability of RAID 5 systems from [1]...
详细信息
Pattern matching over event streams is well developed. However, withthe increasing demand of measurement accuracy, confidence of more complex events sourced from original, continuously arriving events generated from ...
详细信息
the proceedings contain 57 papers. the topics discussed include: label-based DV-Hop localization against wormhole attacks in wireless sensor networks;a simple group key management approach for mobile ad hoc networks;a...
ISBN:
(纸本)9780769541341
the proceedings contain 57 papers. the topics discussed include: label-based DV-Hop localization against wormhole attacks in wireless sensor networks;a simple group key management approach for mobile ad hoc networks;a fine-grained data reconstruction algorithm for solid-state disks;a distributed approach for hidden wormhole detection with neighborhood information;a high effective indexing and retrieval method providing block-level timely recovery to any point-in-time;characterizing the dependability of distributed storage systems using a two-layer hidden Markov model-based approach;fault tolerant data collection in heterogeneous intelligent monitoring networks;a probabilistic routing protocol for heterogeneous sensor networks;time-bounded essential localization for wireless sensor networks;stabilizing path modification of power-aware on/off interconnection networks;and fast and memory-efficient traffic classification with deep packet inspection in CMP architecture.
Wavelet transform is an important mathematical tool with strong application in signal processing. From the mathematical point of view many different wavelet transforms are developed, where orthogonal wavelet transform...
详细信息
Wavelet transform is an important mathematical tool with strong application in signal processing. From the mathematical point of view many different wavelet transforms are developed, where orthogonal wavelet transforms are more important. Also, various algorithms for their computations are developed. In this paper we present the parallel implementation of the orthogonal wavelet transform named Daubechies D4 transform, which consists of four taps filters. At the beginning, a short mathematical introduction of the Daubechies D4 transform is presented. After that the sequential and parallel algorithm implementations are analyzed. A parallel implementation for the PC cluster environment is done. We have analyzed two ways of parallel implementation, one with row mapping and the other with block mapping (mesh architecture of the processing elements). For practical implementation the C++ programming language and MPI paradigm (for parallel environment) are used. Finally, the mathematical complexity of these implementations is presented. In particular, we have analyzed separately the complexity of computations and communications. It is shown, also, that the complexity is between O(log 2 n) and O(n 2 / p). At the end, a comparison of sequential and parallel implementations is presented according to the results obtained by testing the practical implementations.
Multi-Processor Systems on Chip (MPSoCs) have been proposed as a promising solution for the increasing demand of computational power required for recent application. the parallelization through SIMD (single instructio...
详细信息
Multi-Processor Systems on Chip (MPSoCs) have been proposed as a promising solution for the increasing demand of computational power required for recent application. the parallelization through SIMD (single instruction/multiple data) architectures has been a proven solution to speed up the processing of the recent application that exhibit massive amounts of data parallelism. the level of parallelism impacts the SIMD architecture performance and it is closely related to the design of the processing element. In this context this paper presents a new design methodology of designing processing element for SIMD architecture. the scope of this work is to reduce the pipeline stages of the soft-core processor to reduce the size of the PEs and so that to built up a high level parallelism architecture.
Numerical homogenization is used for up-scaling of a linear elasticity tensor of strongly heterogeneous micro-structures Utilized approach assumes presence of a periodic micro-structure and thus periodic boundary cond...
详细信息
ISBN:
(纸本)9783642144028
Numerical homogenization is used for up-scaling of a linear elasticity tensor of strongly heterogeneous micro-structures Utilized approach assumes presence of a periodic micro-structure and thus periodic boundary conditions Rotated trilinear Rannacher-Turek finite elements are used for the discretization while a parallel PCG method is used to solve arising large-scale systems with sparse, symmetric, positive semidefinite matrices Applied preconditioner is based on modified incomplete Cholesky factorization MIC(0) the test problem represents a trabecular bone tissue, and takes into account only the elastic response of the solid phase the voxel micro-structure of the bone is extracted from a high resolution computer tomography image Numerical tests performed on parallel computers demonstrate the efficiency of the developed algorithm
暂无评论