OpenMP and Intel threading Building Blocks (TBB) are two parallel programming paradigms for multicore processors. they have a lot in common but were designed in mind for different parallel execution models. Comparing ...
详细信息
ISBN:
(纸本)9783642143892
OpenMP and Intel threading Building Blocks (TBB) are two parallel programming paradigms for multicore processors. they have a lot in common but were designed in mind for different parallel execution models. Comparing the performance gain of these two paradigms depends to a great extent on the parallelization overheads of their parallel mechanisms. parallel overheads are inevitable and therefore understanding their potential costs can help developers to design more scalable applications. this paper presents a comparative study of OpenMP and TBB parallelization overheads. the study was conducted on a dual-core machine with two different compilers;Intel compiler and Microsoft Visual Studio C++ 2008, and shows that Intel compiler outperforms Microsoft compiler. Nevertheless, the relative performance of TBB versus OpenMP is mainly depends on the implementation of the parallel constructs of a specific compiler.
In recent years the evolution of software architectures led to the rising prominence of the Service Oriented Architecture (SOA) concept. this architecture paradigm facilitates building flexible service systems. the se...
详细信息
ISBN:
(纸本)9783642143892
In recent years the evolution of software architectures led to the rising prominence of the Service Oriented Architecture (SOA) concept. this architecture paradigm facilitates building flexible service systems. the services can be deployed in distributed environments, executed on different hardware and software platforms, reused and composed into complex services. In the paper a SOA request analysis and distribution architecture for hybrid environments is proposed. the requests are examined in accordance to the SOA request description model. the functional and non-functional request requirements in conjunction with monitoring of execution and communication links performance data are used to distribute requests and allocate the services and resources.
the Union-Find algorithm is used for maintaining a number of non-overlapping sets from a finite universe of elements. the algorithm has applications in a number of areas including the computation of spanning trees, sp...
详细信息
ISBN:
(纸本)9783642143892
the Union-Find algorithm is used for maintaining a number of non-overlapping sets from a finite universe of elements. the algorithm has applications in a number of areas including the computation of spanning trees, sparse linear algebra, and in image processing. Although the algorithm is inherently sequential there has been some previous efforts at constructing parallel implementations. these have mainly focused on shared memory computers. In this paper we present the first scalable parallel implementation of the Union-Find algorithm suitable for distributed memory computers. Our new parallel algorithm is based on an observation of how the Find part of the sequential algorithm can be executed more efficiently. We show the efficiency of our implementation through a series of tests to compute spanning forests of very large graphs.
We present a concept of a parallel implementation of a novel 3-D model of tumor growth. the model is based on particle dynamics, which are building blocks of normal, cancerous and vascular tissues. the dynamics of the...
详细信息
ISBN:
(纸本)9783642143892
We present a concept of a parallel implementation of a novel 3-D model of tumor growth. the model is based on particle dynamics, which are building blocks of normal, cancerous and vascular tissues. the dynamics of the system is driven also by the processes in microscopic scales (e.g. cell life-cycle), diffusive substances nutrients and TAF (tumor angiogenic factors) - and blood flow. We show that the cell life-cycle (particle production and annihilation), the existence of elongated particles, the influence of continuum fields and blood flow in capillaries, makes the model very tough for parallelization in comparison to standard MD codes. We present preliminary timings of our parallel implementation and we discuss the perspectives of our approach.
In tins paper we take a look at what the new Quad-Core Intel Xeon Processor code name Nehalem brings to high performance computing. We compare Intel Xeon 5400 series based system with a server utilizing his successor ...
详细信息
ISBN:
(纸本)9783642143892
In tins paper we take a look at what the new Quad-Core Intel Xeon Processor code name Nehalem brings to high performance computing. We compare Intel Xeon 5400 series based system with a server utilizing his successor the new Intel Xeon X5560. We compare both CPU generations utilizing dual socket platform us using a number of HPC benchmarks. the results clearly prove that the new Intel Xeon processor 5500 family provide significant performance advantage on typical HPC workloads and demonstrate to be a right choice for many of HPC installations.
Real-time syntactic pattern recogniton imposes strict computing time constraints on new techniques developed. Recently, a method for an analysis of hand postures of the Polish Sign Language based on the ETPL(k) graph ...
详细信息
ISBN:
(纸本)9783642143892
Real-time syntactic pattern recogniton imposes strict computing time constraints on new techniques developed. Recently, a method for an analysis of hand postures of the Polish Sign Language based on the ETPL(k) graph grammars (Flasinski: Patt. Recogn. 26 (1993);1-16;theor. Comp. Sci. 201 (1998), 189-231) has been constructed. In order to make a system implemented more feasible for the users, a research into parallelization of a pattern recognition process has been led. Possible techniques of tasks distribution have been tested. It has allowed us to define an optimum strategy of parallelization. the results are presented in the paper.
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recent...
详细信息
ISBN:
(纸本)9783642144028
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recently proposed to increase productivity in the context of these architectures In this paper we present a novel approach that resorts to the service abstraction for annotating parallelism
We propose here a parallel implementation of multidimensional scaling (MDS) method which can be used for visualization of large datasets of multidimensional data.. Unlike in traditional approaches, which employ classi...
详细信息
ISBN:
(纸本)9783642143892
We propose here a parallel implementation of multidimensional scaling (MDS) method which can be used for visualization of large datasets of multidimensional data.. Unlike in traditional approaches, which employ classical minimization methods for finding the global optimum of the "stress function", we use a heuristic based on particle dynamics. this method allows avoiding local minima and is convergent to the global one. However, due to its O(N-2) complexity, the application of this method in data mining problems involving large datasets requires efficient parallel codes. We show that employing both optimized Taylor's algorithm and hybridized model of parallel computations, our solver is efficient enough to visualize multidimensional data sets consisting of 10(4) feature vectors in time of minutes.
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for gene...
详细信息
ISBN:
(纸本)9783642143892
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance co-processors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture. We present, a high-performance in-place implementation of Batcher's bitonic sorting networks for CUDA-enabled GPUs. We adapted}Atonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.
Available GPUs provide increasingly more processing power especially for multimedia and digital signal processing. Despite the tremendous progress in hardware and thus processing power, there are and always will be ap...
详细信息
ISBN:
(纸本)9783642143892
Available GPUs provide increasingly more processing power especially for multimedia and digital signal processing. Despite the tremendous progress in hardware and thus processing power, there are and always will be applications that require using multiple GPUs either running inside the same machine or distributed in the network due to computational intensive processing algorithms. Existing solutions for developing applications for GPUs still require a lot of hand-optimization when using multiple GPUs inside the same machine and provide in general no support for using remote GPUs distributed in the network. In this paper we address tins problem and show that an open distributed multimedia middleware, like the Network-Integrated Multimedia Middleware (NMM), is able (1) to seamlessly integrate processing components using GPUs while completely hiding GPU specific issues from the application developer, (2) to transparently combine processing components using GPUs or CPUs, and (3) to transparently use local and remote GPUs for distributed processing.
暂无评论