the Symposium materials contain 118 papers on new developments in parallelprocessing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, an...
详细信息
ISBN:
(纸本)0818626720
the Symposium materials contain 118 papers on new developments in parallelprocessing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, and distributed systems are among the main topics covered.
Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallelalgorithms in particular. We present an adaptive algorithm selection framework whic...
详细信息
ISBN:
(纸本)0769522297
Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallelalgorithms in particular. We present an adaptive algorithm selection framework which can collect and interpret the inputs of a particular instance of a parallel algorithm and select the best performing one from a an existing library. In this paper present the dynamic selection of parallel reduction algorithms. First we introduce a set of high-level parameters that can characterize different parallel reduction algorithms. then we describe an off-line, systematic process to generate predictive models which can be used for run-time algorithm selection. Our experiments show that our framework: (a) selects the most appropriate algorithms in 85% of the cases studied, (b) overall delievers 98% of the optimal performance, (c) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and (d) adapts to the underlying machine architecture (tested on IBM Regatta and HP V-Class systems).
this work studies how to adapt the number of threads of a parallel Interval Branch and Bound algorithm to the available computational resources based on its current performance. Basically, a thread can create a new th...
详细信息
this work studies how to adapt the number of threads of a parallel Interval Branch and Bound algorithm to the available computational resources based on its current performance. Basically, a thread can create a new thread that will process part of the ancestor workload. In this way, load balancing is inherent to the creation of threads. the applications in which we are interested use branch-and-bound algorithms which are highly irregular and therefore difficult to predict. the proposed methods can be used for more predictable algorithms as well. this research complements and does not substitute other devices that improve the exploitation of the system, such as dynamic scheduling policies or work-stealing. Several approaches are presented. they differ in the metrics used and in the need or not having to modify the Operating System (O.S.). the scenario for this research is just one multithreaded application running in a multicore architecture. Experimental results show that the appropriate number of running threads can be determined at run-time, avoiding having to statically establish the number of threads of an application. thread creation decisions have to be made frequently to obtain better results, but are time-consuming. One of the presented models uses the existence of an idle processor to carry out these decisions, obtaining the desired results.
Driven by the development of new technologies such as personal assistants or autonomous cars, machine learning has rapidly become one of the most active fields in computer science. the algorithms at the core of machin...
详细信息
ISBN:
(纸本)9781538649756
Driven by the development of new technologies such as personal assistants or autonomous cars, machine learning has rapidly become one of the most active fields in computer science. the algorithms at the core of machine learning are notoriously demanding in terms of resources. It is therefore of paramount importance to optimize their operation on modern processors. Several approaches have been proposed to accelerate machine learning on GPUs and massively parallel computers, as well as dedicated ASICs. In this paper, we focus on Intel's multi-core Xeon and many-core accelerator Xeon Phi Knights Landing, which can host several hundreds of threads on the same CPU. In such architectures, thread and data mapping are keys for performance. We study the impact of mapping strategies, revealing that, with smart mapping policies, one can indeed significantly speed up machine learning applications on manycore architectures. Execution time was reduced by up to 25.2% and 18.5% on Intel Xeon and Xeon Phi KNL, respectively.
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by al...
详细信息
ISBN:
(纸本)0769507166
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallelarchitectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.
Matching is an important pari of a model-based object recognition system. Matching is a difficult task, for a number of reasons. First, in a number of recognition systems matching is formulated as a combinatorial prob...
详细信息
the two volume set LNCS 7133 and LNCS 7134 constitutes the thoroughly refereed post-conference proceedings of the 10thinternationalconference on Applied parallel and Scientific Computing, PARA 2010, held in Reykjav&...
详细信息
ISBN:
(数字)9783642281457
ISBN:
(纸本)9783642281440
the two volume set LNCS 7133 and LNCS 7134 constitutes the thoroughly refereed post-conference proceedings of the 10thinternationalconference on Applied parallel and Scientific Computing, PARA 2010, held in Reykjavík, Iceland, in June 2010. these volumes contain three keynote lectures, 29 revised papers and 45 minisymposia presentations arranged on the following topics: cloud computing, HPC algorithms, HPC programming tools, HPC in meteorology, parallel numerical algorithms, parallel computing in physics, scientific computing tools, HPC software engineering, simulations of atomic scale systems, tools and environments for accelerator based computational biomedicine, GPU computing, high performance computing interval methods, real-time access and processing of large data sets, linear algebra algorithms and software for multicore and hybrid architectures in honor of Fred Gustavson on his 75th birthday, memory and multicore issues in scientific computing - theory and praxis, multicore algorithms and implementations for application problems, fast PDE solvers and a posteriori error estimates, and scalable tools for high performance computing.
High-order finite-differencemethods are commonly used in wave propagator for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are...
详细信息
ISBN:
(纸本)9783319198002;9783319197999
High-order finite-differencemethods are commonly used in wave propagator for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are considered. thread parallelalgorithms suitable for implementing this propagator on multi-core and many-core processing devices are introduced. Portability is addressed through the use of the OCCA runtime programming interface. Finally, performance results are shown for various architectures on a representative synthetic test case.
parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-inte...
详细信息
parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-intensive workloads due to (1) reliability and instant processing of data requests under dynamic workload conditions, and (2) the optimum tradeoff between system scalability and data reliability in data-intensive systems. To increase computing performance and reduce power consumption, Graphics processing Units (GPUs) will be used. As the architectures and data processingalgorithms for GPU-based parallel disk systems are still in their infancy, this research will develop novel hardware and software architecturesthat include parallel GPU, flash disks, and disk arrays for data-intensive applications. (c) 2014 Published by Elsevier B.V.
An analysis of a parallel solution of N-2-1 Puzzle using clusters, is presented. this problem is interesting due to its complexity and related applications, particularly in the field of robotics. A variation of classi...
详细信息
ISBN:
(纸本)9789537138127
An analysis of a parallel solution of N-2-1 Puzzle using clusters, is presented. this problem is interesting due to its complexity and related applications, particularly in the field of robotics. A variation of classic heuristics for forecasting the work to be done in order to reach a solution is analyzed, and it is shown that its use significantly improves the time of sequential algorithm A*. then, a parallel solution on a distributed architecture is presented and speedup is analyzed based on the number of processors, efficiency, and the possible superlinearity when scaling the problem.
暂无评论