检索结果-内蒙古大学图书馆

Proceedings of the 6th international parallel processing Symposium

the Symposium materials contain 118 papers on new developments in parallel processing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, an... 详细信息

ISBN: (纸本)0818626720

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

An adaptive algorithm selection framework

An adaptive algorithm selection framework

引用

13th international conference on parallel Architecture and Compilation Techniques

作者： Yu, H Zhang, DM Rauchwerger, L IBM Corp Thomas J Watson Res Ctr Yorktown Hts NY 10598 USA

ISBN: (纸本)0769522297

Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. We present an adaptive algorithm selection framework which can collect and interpret the inputs of a particular instance of a parallel algorithm and select the best performing one from a an existing library. In this paper present the dynamic selection of parallel reduction algorithms. First we introduce a set of high-level parameters that can characterize different parallel reduction algorithms. then we describe an off-line, systematic process to generate predictive models which can be used for run-time algorithm selection. Our experiments show that our framework: (a) selects the most appropriate algorithms in 85% of the cases studied, (b) overall delievers 98% of the optimal performance, (c) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and (d) adapts to the underlying machine architecture (tested on IBM Regatta and HP V-Class systems).

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Adaptive parallel interval branch and bound algorithms based on their performance for multicore architectures

引用

JOURNAL OF SUPERCOMPUTING 2011年第3期58卷 376-384页

作者： Sanjuan-Estrada, J. F. Casado, L. G. Garcia, I. Univ Almeria Dept Comp Architecture & Elect Almeria Spain Univ Malaga Dept Comp Architecture E-29071 Malaga Spain

this work studies how to adapt the number of threads of a parallel Interval Branch and Bound algorithm to the available computational resources based on its current performance. Basically, a thread can create a new thread that will process part of the ancestor workload. In this way, load balancing is inherent to the creation of threads. the applications in which we are interested use branch-and-bound algorithms which are highly irregular and therefore difficult to predict. the proposed methods can be used for more predictable algorithms as well. this research complements and does not substitute other devices that improve the exploitation of the system, such as dynamic scheduling policies or work-stealing. Several approaches are presented. they differ in the metrics used and in the need or not having to modify the Operating System (O.S.). the scenario for this research is just one multithreaded application running in a multicore architecture. Experimental results show that the appropriate number of running threads can be determined at run-time, avoiding having to statically establish the number of threads of an application. thread creation decisions have to be made frequently to obtain better results, but are time-consuming. One of the presented models uses the existence of an idle processor to carry out these decisions, obtaining the desired results.

关键词： Multithreaded Shared memory parallel processors Performance analysis Branch-and-bound Global optimization Irregularity

来源：评论

学校读者我要写书评

暂无评论

Optimizing Machine Learning algorithms on Multi-core and Many-core architectures using thread and Data Mapping 26

Optimizing Machine Learning Algorithms on Multi-core and Man...

引用

26th Euromicro international conference on parallel, Distributed, and Network-Based processing (PDP)

作者： Serpa, Matheus S. Krause, Arthur M. Cruz, Eduardo H. M. Navaux, Philippe O. A. Pasin, Marcelo Felber, Pascal Fed Univ Rio Grande Sul UFRGS Inst Informat Porto Alegre RS Brazil Univ Neuchatel Neuchatel Switzerland

ISBN: (纸本)9781538649756

Driven by the development of new technologies such as personal assistants or autonomous cars, machine learning has rapidly become one of the most active fields in computer science. the algorithms at the core of machine learning are notoriously demanding in terms of resources. It is therefore of paramount importance to optimize their operation on modern processors. Several approaches have been proposed to accelerate machine learning on GPUs and massively parallel computers, as well as dedicated ASICs. In this paper, we focus on Intel's multi-core Xeon and many-core accelerator Xeon Phi Knights Landing, which can host several hundreds of threads on the same CPU. In such architectures, thread and data mapping are keys for performance. We study the impact of mapping strategies, revealing that, with smart mapping policies, one can indeed significantly speed up machine learning applications on manycore architectures. Execution time was reduced by up to 25.2% and 18.5% on Intel Xeon and Xeon Phi KNL, respectively.

关键词： Machine learning thread mapping Data mapping Memory accesses

来源：评论

学校读者我要写书评

暂无评论

Generation of scheduling functions supporting LSGP-partitioning

Generation of scheduling functions supporting LSGP-partition...

引用

12th IEEE international conference on Application-Specific Systems, architectures, and Processors

作者： Fimmel, D Tech Univ Dresden Dept Elect Engn IEE D-8027 Dresden Germany

ISBN: (纸本)0769507166

In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallel architectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

SURVEY OF parallel algorithms FOR STRUCTURAL PATTERN MATCHING 12

SURVEY OF PARALLEL ALGORITHMS FOR STRUCTURAL PATTERN MATCHIN...

引用

conference C on Signal processing and conference D on parallel Computing, at the 12th IAPR international conference on Pattern Recognition

作者： GUERRA, C UNIV PADUA DIPARTIMENTO ELETTR & INFORMATI-35100 PADUAITALY

ISBN: (纸本)0818662751

Matching is an important pari of a model-based object recognition system. Matching is a difficult task, for a number of reasons. First, in a number of recognition systems matching is formulated as a combinatorial problem with exponential worst-case complexity. thus, heuristics are needed to reduce the complexity by pruning the search space. Second, images do not present perfect data: noise and occlusion greatly complicate the task. Finally, even at moderate image resolutions the amount of data to be handled is such that this task cannot be done in real-time on supercomputers. Although no existing visual system can solve the general recognition problem, some existing approaches have obtained acceptable results for limited domains or simple scenes. Surveys of many sequential approaches to matching in recognition systems can be found in [2], [7], [18]. Comparatively, much less work has been done on parallel matching, despite the great need for speeding up the process. parallel algorithms have oflen to be designed from scratch, and the recognition problem itself oflen requires reformulation since many of the proposed sequential algorithms do not lend themselves naturally to efficient parallel implementations. In this paper, we survey some of the existing parallel matching algorithms for 2D and 3D objects. Some of these algorithms have been implemented on SIMD architectures such as the Connection Machine or MasPar, or MIMD machines such as the Intel Touchstone Delta;other algorithms have been developed for the PRAM model of computation. © 1994 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Applied parallel and Scientific Computing 1

引用

丛书名： Lecture Notes in Computer Science

1000年

ISBN: (数字)9783642281457

ISBN: (纸本)9783642281440

the two volume set LNCS 7133 and LNCS 7134 constitutes the thoroughly refereed post-conference proceedings of the 10th international conference on Applied parallel and Scientific Computing, PARA 2010, held in Reykjavík, Iceland, in June 2010. these volumes contain three keynote lectures, 29 revised papers and 45 minisymposia presentations arranged on the following topics: cloud computing, HPC algorithms, HPC programming tools, HPC in meteorology, parallel numerical algorithms, parallel computing in physics, scientific computing tools, HPC software engineering, simulations of atomic scale systems, tools and environments for accelerator based computational biomedicine, GPU computing, high performance computing interval methods, real-time access and processing of large data sets, linear algebra algorithms and software for multicore and hybrid architectures in honor of Fred Gustavson on his 75th birthday, memory and multicore issues in scientific computing - theory and praxis, multicore algorithms and implementations for application problems, fast PDE solvers and a posteriori error estimates, and scalable tools for high performance computing.

关键词： Mathematics of Computing Software Engineering/Programming and Operating Systems Algorithm Analysis and Problem Complexity Complexity Computational Mathematics and Numerical Analysis Computer Communication Networks

来源：评论

学校读者我要写书评

暂无评论

High-Order Finite-Differences on Multi-threaded architectures Using OCCA 10th

引用

10th international conference on Spectral and High-Order Methods (ICOSAHOM)

作者： Medina, David St-Cyr, Amik Warburton, Timothy Rice Univ Computat & Appl Math Houston TX 77005 USA Royal Dutch Shell Seism Applicat Team Rijswijk Netherlands

ISBN: (纸本)9783319198002;9783319197999

High-order finite-differencemethods are commonly used in wave propagator for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are considered. thread parallel algorithms suitable for implementing this propagator on multi-core and many-core processing devices are introduced. Portability is addressed through the use of the OCCA runtime programming interface. Finally, performance results are shown for various architectures on a representative synthetic test case.

关键词： Finite difference method

来源：评论

学校读者我要写书评

暂无评论

Central and Distributed GPU Based parallel Disk Systems for Data Intensive Applications

Central and Distributed GPU Based Parallel Disk Systems for ...

引用

11th international conference on Mobile Systems and Pervasive Computing (MobiSPC)

作者： Nijim, Mais Saha, Soumya Nijim, Yousef Texas A&M Univ Kingsville TX 78363 USA

parallel disk systems are capable of fulfilling rapidly increasing demands on both large storage capacity and high I/O performance. However, it is challenging to significantly increase disk I/O bandwidth for data-intensive workloads due to (1) reliability and instant processing of data requests under dynamic workload conditions, and (2) the optimum tradeoff between system scalability and data reliability in data-intensive systems. To increase computing performance and reduce power consumption, Graphics processing Units (GPUs) will be used. As the architectures and data processing algorithms for GPU-based parallel disk systems are still in their infancy, this research will develop novel hardware and software architectures that include parallel GPU, flash disks, and disk arrays for data-intensive applications. (c) 2014 Published by Elsevier B.V.

关键词： GPU Flash disks parallel Disk systems

来源：评论

学校读者我要写书评

暂无评论

parallel processing Puzzle N²-1 on cluster architectures performance analysis

Parallel processing Puzzle N<SUP>2</SUP>-1 on cluster archit...

引用

30th international conference on Information Technology Interfaces

作者： Sanz, Victoria de Giusti, Armando Chichizola, Franco Naiouf, Marcelo De Giusti, Laura Instituto de Investigación en Informática (III-LIDI) School of Computer Sciences UNLP

ISBN: (纸本)9789537138127

An analysis of a parallel solution of N-2-1 Puzzle using clusters, is presented. this problem is interesting due to its complexity and related applications, particularly in the field of robotics. A variation of classic heuristics for forecasting the work to be done in order to reach a solution is analyzed, and it is shown that its use significantly improves the time of sequential algorithm A*. then, a parallel solution on a distributed architecture is presented and speedup is analyzed based on the number of processors, efficiency, and the possible superlinearity when scaling the problem.

关键词： parallel algorithms distributed processing speedup superlinearity efficiency scalability

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：