OpenMP and Intel threading Building Blocks (TBB) are two parallel programming paradigms for multicore processors. they have a lot in common but were designed in mind for different parallel execution models. Comparing ...
详细信息
ISBN:
(纸本)9783642143892
OpenMP and Intel threading Building Blocks (TBB) are two parallel programming paradigms for multicore processors. they have a lot in common but were designed in mind for different parallel execution models. Comparing the performance gain of these two paradigms depends to a great extent on the parallelization overheads of their parallel mechanisms. parallel overheads are inevitable and therefore understanding their potential costs can help developers to design more scalable applications. this paper presents a comparative study of OpenMP and TBB parallelization overheads. the study was conducted on a dual-core machine with two different compilers;Intel compiler and Microsoft Visual Studio C++ 2008, and shows that Intel compiler outperforms Microsoft compiler. Nevertheless, the relative performance of TBB versus OpenMP is mainly depends on the implementation of the parallel constructs of a specific compiler.
We propose here a parallel implementation of multidimensional scaling (MDS) method which can be used for visualization of large datasets of multidimensional data.. Unlike in traditional approaches, which employ classi...
详细信息
ISBN:
(纸本)9783642143892
We propose here a parallel implementation of multidimensional scaling (MDS) method which can be used for visualization of large datasets of multidimensional data.. Unlike in traditional approaches, which employ classical minimization methods for finding the global optimum of the "stress function", we use a heuristic based on particle dynamics. this method allows avoiding local minima and is convergent to the global one. However, due to its O(N-2) complexity, the application of this method in data mining problems involving large datasets requires efficient parallel codes. We show that employing both optimized Taylor's algorithm and hybridized model of parallel computations, our solver is efficient enough to visualize multidimensional data sets consisting of 10(4) feature vectors in time of minutes.
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for gene...
详细信息
ISBN:
(纸本)9783642143892
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture. We present, a high-performance in-place implementation of Batcher's bitonic sorting networks for CUDA-enabled GPUs. We adapted}Atonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.
In recent years the evolution of software architectures led to the rising prominence of the Service Oriented Architecture (SOA) concept. this architecture paradigm facilitates building flexible service systems. the se...
详细信息
ISBN:
(纸本)9783642143892
In recent years the evolution of software architectures led to the rising prominence of the Service Oriented Architecture (SOA) concept. this architecture paradigm facilitates building flexible service systems. the services can be deployed in distributed environments, executed on different hardware and software platforms, reused and composed into complex services. In the paper a SOA request analysis and distribution architecture for hybrid environments is proposed. the requests are examined in accordance to the SOA request description model. the functional and non-functional request requirements in conjunction with monitoring of execution and communication links performance data are used to distribute requests and allocate the services and resources.
In tins paper we take a look at what the new Quad-Core Intel Xeon Processor code name Nehalem brings to high performance computing. We compare Intel Xeon 5400 series based system with a server utilizing his successor ...
详细信息
ISBN:
(纸本)9783642143892
In tins paper we take a look at what the new Quad-Core Intel Xeon Processor code name Nehalem brings to high performance computing. We compare Intel Xeon 5400 series based system with a server utilizing his successor the new Intel Xeon X5560. We compare both CPU generations utilizing dual socket platform us using a number of HPC benchmarks. the results clearly prove that the new Intel Xeon processor 5500 family provide significant performance advantage on typical HPC workloads and demonstrate to be a right choice for many of HPC installations.
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recent...
详细信息
ISBN:
(纸本)9783642144028
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recently proposed to increase productivity in the context of these architectures In this paper we present a novel approach that resorts to the service abstraction for annotating parallelism
Real-time syntactic pattern recogniton imposes strict computing time constraints on new techniques developed. Recently, a method for an analysis of hand postures of the Polish Sign Language based on the ETPL(k) graph ...
详细信息
ISBN:
(纸本)9783642143892
Real-time syntactic pattern recogniton imposes strict computing time constraints on new techniques developed. Recently, a method for an analysis of hand postures of the Polish Sign Language based on the ETPL(k) graph grammars (Flasinski: Patt. Recogn. 26 (1993);1-16;theor. Comp. Sci. 201 (1998), 189-231) has been constructed. In order to make a system implemented more feasible for the users, a research into parallelization of a pattern recognition process has been led. Possible techniques of tasks distribution have been tested. It has allowed us to define an optimum strategy of parallelization. the results are presented in the paper.
In this paper we propose a strategy of partial data replication for efficient parallel computing of the Discrete Wavelet Transform in a distributed memory environment. the key is to avoid the communications needed bet...
详细信息
ISBN:
(纸本)9783642143892
In this paper we propose a strategy of partial data replication for efficient parallel computing of the Discrete Wavelet Transform in a distributed memory environment. the key is to avoid the communications needed between computation of different wavelet levels, by replicating part of the data and part;of the computations, avoiding completely communications (except maybe at the setup phase). A similar idea. was proposed in a paper by Chaver et. al.;however, they proposed to replicate completely the data, which can require too much memory in each processor. In this work we have determined exactly how many data items shall be needed for each processor, in order to compute the DWT without extra communications and using only the memory strictly necessary.
During vector predictive coding of digital signal series, the vector signal series, obtained by grouping adjacent samples of sources signal series, can approximate to a vector autoregressive series with stable covaria...
详细信息
ISBN:
(纸本)1424411351
During vector predictive coding of digital signal series, the vector signal series, obtained by grouping adjacent samples of sources signal series, can approximate to a vector autoregressive series with stable covariance. this paper, applying the orthogonal projection principle of Hilbert space, attempts to formulate a vector predictive coding strategy highly capable of parallelprocessing and to deduce from this strategy an adaptive parallelprocessing, algorithm, which, compared with traditional lattice algorithms, has improved remarkably in calculation complexity and storage space.
Available GPUs provide increasingly more processing power especially for multimedia and digital signal processing. Despite the tremendous progress in hardware and thus processing power, there are and always will be ap...
详细信息
ISBN:
(纸本)9783642143892
Available GPUs provide increasingly more processing power especially for multimedia and digital signal processing. Despite the tremendous progress in hardware and thus processing power, there are and always will be applications that require using multiple GPUs either running inside the same machine or distributed in the network due to computational intensive processing algorithms. Existing solutions for developing applications for GPUs still require a lot of hand-optimization when using multiple GPUs inside the same machine and provide in general no support for using remote GPUs distributed in the network. In this paper we address tins problem and show that an open distributed multimedia middleware, like the Network-Integrated Multimedia Middleware (NMM), is able (1) to seamlessly integrate processing components using GPUs while completely hiding GPU specific issues from the application developer, (2) to transparently combine processing components using GPUs or CPUs, and (3) to transparently use local and remote GPUs for distributed processing.
暂无评论