le evaluate a parallel Schur preconditioner for large systems of equations arising from a finite element discretization of the Navier-Stokes equations with streamline diffusion. the performance of the method is assess...
详细信息
ISBN:
(纸本)9783642038686
le evaluate a parallel Schur preconditioner for large systems of equations arising from a finite element discretization of the Navier-Stokes equations with streamline diffusion. the performance of the method is assessed on a biomedical problem involving oscillatory flow in a human abdominal bifurcation. Fast access to flow conditions in this location might support;physicians in quicker decision making concerning potential interventions. We demonstrate scaling to 8 processors with more than 50% efficiency as well as a significant;relaxation of memory requirements. We found an acceleration by up to a factor 9.5 compared to a direct sparse parallel solver at stopping criteria ensuring results similar to a. validated reference solution.
For a long period in the development of computers and computing efficient applications were only characterized by computational - and memory complexity or in more practical terms elapsed computing time and required ma...
详细信息
ISBN:
(纸本)9783642400476
For a long period in the development of computers and computing efficient applications were only characterized by computational - and memory complexity or in more practical terms elapsed computing time and required main memory capacity. the history of euro-par and its predecessor-organizations stands for research on the development of ever more powerful computer architectures that shorten the compute time both by faster clocking and by parallel execution as well as the development of algorithms that can exhibit these parallel architectural features. the success of enhancing architectures and algorithms is best described by exponential curves regarding the peak computing power of architectures and the efficiency of algorithms. As microprocessor parts get more and more power hungry and electricity gets more and more expensive, "energy to solution" is a new optimization criterion for large applications. this calls for energy aware solutions.
Point snatching is crucial for many computer vision applications. Establishing the correspondence between a large number of data points is a computationally intensive process. Some point;matching related applications;...
详细信息
ISBN:
(纸本)9783642038686
Point snatching is crucial for many computer vision applications. Establishing the correspondence between a large number of data points is a computationally intensive process. Some point;matching related applications;such as medical image registration;require real time or near real time performance if applied to critical clinical applications like image :assisted surgery. In this paper;we report a new multicore platform based parallel algorithm for fast point matching in the context of landmark based medical image registration. We introduced a non-regular data, partition algorithm which utilizes the K-means clustering algorithm to group the landmarks based on the number of available processing cores;which optimize the memory usage and data transfer. We have tested our method using the IBM Cell Broadband Engine (Cell/B.E.) platform. the results demonstrated a significant speed up over its sequential implementation. the proposed data partition and parallelization algorithm, though tested only on one multicore platform, is generic by its design. therefore the parallel algorithm can be extended to other computing platforms, as well as other point matching related applications.
the integration of Urgent computing is essential in order to adhere to stringent time and quality constraints of emerging distributed applications, hence facilitating efficient decision-making processes in numerous fi...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
the integration of Urgent computing is essential in order to adhere to stringent time and quality constraints of emerging distributed applications, hence facilitating efficient decision-making processes in numerous fields. Adaptation of such applications to produce outcomes within the desired confidence range and defined time interval can be of great benefit, especially in distributed and heterogeneous execution contexts. this study provides a justification for the necessity of dynamic adaptation in applications that are time-sensitive. Furthermore, we present our viewpoint on time-sensitive applications and undertake a thorough analysis of the underlying principles and challenges that need to be resolved in order to accomplish this goal. this research aims to provide a comparative analysis of our suggested vision for adaptation in contrast to the existing literature. We provide a comprehensive explanation of the architectural framework that we plan to construct, and conclude with discussing some on-going challenges.
this paper presents the XJava compiler for parallel programs. It exploits parallelism based on an object-oriented stream programming paradigm. XJava extends Java with new parallel constructs that;do not;expose program...
详细信息
ISBN:
(纸本)9783642038686
this paper presents the XJava compiler for parallel programs. It exploits parallelism based on an object-oriented stream programming paradigm. XJava extends Java with new parallel constructs that;do not;expose programmers to low-level details of parallel programming on shared memory machines. Tasks define composable parallel activities;and new operators allow an easier expression of parallel patterns, such as pipelines, divide and conquer;or master/worker. We also present an automatic run-time mechanism that extends our previous work to automatically reap tasks and parallel statements to threads. We conducted several case studies with an open source desktop search application and a suite of benchmark programs. the results show that XJava reduces the opportunities to introduce synchronization errors. Compared to threaded Java, the amount of code could be reduced by up to 39%. the run-tune mechanism helped reduce effort for performance tuning and achieved speedups up to 31.5 on all eight core machine.
the execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, cons...
详细信息
ISBN:
(纸本)9781467351652;9780769549149
the execution of parallel applications, using grid computing, requires an environment that enables them to be executed, managed, scheduled and monitored. the execution environment must provide a processing model, consisting of programming and execution models, withthe objective appropriately exploiting grid computing characteristics. this paper proposes a parallel processing model, based on shared variables for grid computing, consisting of an execution model that is appropriate for the grid and a Cparparallel language programming model. the environment is designed to execute parallel applications in grid computing, where all the characteristics present in grid computing are transparent to users. the results show that this environment is an efficient solution for the execution of parallel applications.
the proceedings contain 124 papers from the euro-par 2005 parallel Processing: 11thinternationaleuro-parconference. Proceedings. the topics discussed include: the evolution of the blue gene/L supercomputer;soft com...
详细信息
the proceedings contain 124 papers from the euro-par 2005 parallel Processing: 11thinternationaleuro-parconference. Proceedings. the topics discussed include: the evolution of the blue gene/L supercomputer;soft computing approach to performance analysis of parallel and distributed programs;models for on-the-fly compensation of measurement overhead in parallel performance profiling;performance evaluation of MM5 on clusters with modern interconnects: scalability and impact;an efficient multi-level trace toolkit for multi-threaded applications;knowledge based automatic scalability analysis and extrapolation for MPI programs;an approach to performance prediction for parallel applications.
Current high performance clusters are equipped with high bandwidth/low latency networks, lots of processors and nodes, very fast storage systems, etc. However, due to economical and/or power related constraints, in ge...
详细信息
ISBN:
(纸本)9783642141218
Current high performance clusters are equipped with high bandwidth/low latency networks, lots of processors and nodes, very fast storage systems, etc. However, due to economical and/or power related constraints, in general it is not feasible to provide an accelerating co-processor -such as a graphics processor (GPU)- per node. To overcome this, in this paper we present a CPU virtualization middleware, which makes remote CUDA-compatible GPUs available to all the cluster nodes. the software is implemented on top of the sockets application programming interface, ensuring portability over commodity networks, but it can also be easily adapted to high performance networks.
Synthesizing new images from given image pair and their corresponding depth maps is an essential function for many 3D video applications. Exemplar-based inpainting methods have been proposed in recent years to be used...
详细信息
ISBN:
(纸本)9780769550886
Synthesizing new images from given image pair and their corresponding depth maps is an essential function for many 3D video applications. Exemplar-based inpainting methods have been proposed in recent years to be used to restore newly synthesized images by strategically filling the missing pixels which don't have any references due to occlusion. Due to the prioritized filling process, the inpainting methods usually result in high computational complexity and can hardly reach real-time performance. In this paper, a parallel depth-aided inpainting method is proposed to address the efficiency issue of this kind of high performance algorithms. In order to reduce the computation, the proposed method searches for background pixels in a restricted search range on the reference images for effective context filling. then a partially parallel strategy is proposed to speedup the inpainting process while maintaining its high restoration accuracy. Finally the method is implemented with CUDA on NVidia graphic card GTS450. the experiment results showed that the proposed method could produce the best on par results and is suitable for real-time multi-view image synthesis.
Efficiency of parallel computation is always a fundamental research field in high performance computing. A study was carried out in this paper by combining the high performance parallelcomputing with numerical simula...
详细信息
ISBN:
(纸本)9780769550886
Efficiency of parallel computation is always a fundamental research field in high performance computing. A study was carried out in this paper by combining the high performance parallelcomputing with numerical simulations of copper flash smelting furnace, to investigate influences of various factors upon the computing speed, such as the mesh types, the number of grid and CPU cores. It aims to provide some useful advices for the improvement of the computing efficiency for numerical simulations carried out on the parallelcomputing platform.
暂无评论