检索结果-内蒙古大学图书馆

37th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Kenzel, Michael Lemme, Stefan Membarth, Richard Kurtenacker, Matthias Devillerst, Hugo Steinberger, Markus Slusallek, Philipp Deutsch Forschungszentrum Kfinstliche Intelligenz Saarland Informat Campus Kaiserslautern Germany Tech Hsch Ingolstadt THI Res Inst Almot Bavaria Ingolstadt Germany Univ Saarland Saarland Informat Campus Saarbrucken Germany Graz Univ Technol Graz Austria

ISBN: (纸本)9798350337662

Concurrent queue algorithms have been subject to extensive research. However, the target hardware and evaluation methodology on which the published results for any two given concurrent queue algorithms are based often share only minimal overlap. A meaningful comparison is, thus, exceedingly difficult. With the continuing trend towards more and more heterogeneous systems, it is becoming more and more important to not only evaluate and compare novel and existing queue algorithms across a wider range of target architectures, but to also be able to continuously re-evaluate queue algorithms in light of novel architectures and capabilities. To address this need, we present AnyQ, an evaluation framework for concurrent queue algorithms. We design a set of programming abstractions that enable the mapping of concurrent queue algorithms and benchmarks to a wide variety of target architectures. We demonstrate the effectiveness of these abstractions by showing that a queue algorithm expressed in a portable, high-level manner can achieve performance comparable to handcrafted implementations. We design a system for testing and benchmarking queue algorithms. Using the developed framework, we investigate concurrent queue algorithm performance across a range of both CPU as well as GPU architectures. In hopes that it may serve the community as a starting point for building a common repository of concurrent queue algorithms as well as a base for future research, all code and data is made available as open source software at https://***/anyq.

关键词： Massively parallel Concurrent Queue GPU

来源：评论

学校读者我要写书评

暂无评论

programming parallel VISION algorithms - A DATAFLOW LANGUAGE APPROACH

引用

international JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING 1988年第4期2卷 29-44页

作者： SHAPIRO, LG UNIVERSITY OF WASHINGTON SEATTLE WASHINGTON 98195

Computer vision requires the processing of large volumes of data and requires parallel architectures and algorithms to be useful in real-time, industrial applica tions. The INSIGHT dataflow language was designed to allow encoding of vision algorithms at all levels of the computer vision paradigm. INSIGHT programs, which are relational in nature, can be translated into a graph structure that represents an architecture for solving a particular vision problem or a configuration of a reconfi gurable computational network. We consider here IN SIGHT programs that produce a parallel net architecture for solving low-, mid-, and high-level vision tasks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Improvement of Workload Balancing Using parallel Loop Self-Scheduling on Xeon Phi 7

Improvement of Workload Balancing Using Parallel Loop Self-S...

引用

Seventh international symposium on parallel architectures, algorithms and programming

作者： Huang, Chao-Wei Kuo, Chan-Fu Yang, Chao-Tung Liu, Jung-Chun Chen, Shuo-Tsung Tunghai Univ Dept Comp Sci Taichung 40704 Taiwan

ISBN: (纸本)9781467391160

In this paper, we will examine how to improve workload balancing on a computing cluster by a parallel loop self-scheduling scheme. We use hybrid MPI and OpenMP parallel programming in C language. The block partition loop is according to the performance weighting of compute nodes. This study implements parallel loop self-scheduling use Xeon Phi, with its characteristics to improve workload balancing between heterogeneous nodes. The parallel loop self-scheduling is composed of the static and dynamic allocation. A weighting algorithm is adopted in the static part while the well-known loop self-scheduling scheme is adopted in the dynamic part. In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. In our experiment, we will use a plurality of computing nodes. We compute four applications, i.e., matrix multiplication, sparse matrix multiplication, Mandelbrot set computation, and the circuit satisfiability problem. Our results will show how to do the weight allocation and how to choose a scheduling scheme to achieve the best performance in the parallel loop self-scheduling.

关键词： Xeon Phi Many-core OpenMP MPI parallel Loop Self-Scheduling

来源：评论

学校读者我要写书评

暂无评论

Proxy placement in coordinated en-route transcoding caching for tree networks

Proxy placement in coordinated en-route transcoding caching ...

引用

7th international symposium on parallel architectures, algorithms and Networks (I-SPAN 2004)

作者： Li, KQ Shen, H Japan Adv Inst Sci & Technol Grad Sch Informat Sci Tatsunokuchi Ishikawa 9231292 Japan

ISBN: (纸本)0769521355

This paper addresses the problem of transcoding proxy placement for coordinated en-route web caching for tree networks. We propose a model for this problem by considering all the nodes among the network in a coordinated way and formulate this problem as an optimization problem. We implement our dynamic programming-based algorithm and evaluate our model on different performance metrics through extensive simulation experiments. The implementation results show that our model outperforms the placement model for linear topology.

关键词： transcoding proxy placement dynamic programming optimization problem World Wide Web tree networks

来源：评论

学校读者我要写书评

暂无评论

A new type of parallel computing architecture and its parallel computer based on N+1 programs

A new type of parallel computing architecture and its parall...

引用

international symposium on parallel architectures, algorithms, and programming

作者： Xu, Zhaochang Zhang, Ning Xu, Haoyuan Shanghai Cell-computer Lab. Shanghai China Yokohama National University Yokohama Japan

ISBN: (纸本)9780769543123

This paper introduces a new type of parallel computer based on N+1 programs (hereinafter, N+1 computer), as well as its features. A new concept of parallel computing architecture based on N+1 programs is also presented at same time. We studied the essential problems and inherent laws on parallel systems, and analyzed the operateability, the computability, and the compilability of the system, as well as its generality of this new parallel computing architecture. Finally, an N+1 computer prototype machine was trialed which could be considered to be a new general type of platform on parallel processing studies. © 2010 IEEE.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Research and Implementation of High Performance Traffic Processing based on Intel DPDK 9

Research and Implementation of High Performance Traffic Proc...

引用

9th international Conference on parallel architectures, algorithms and programming (PAAP)

作者： Zhu, Wenjun Li, Peng Luo, Baozhou Xu, He Zhang, Yujie Nanjing Univ Posts & Telecommun Sch Comp Sci Nanjing 210003 Jiangsu Peoples R China Nanjing Univ Posts & Telecommun Sch Comp Sci Jiangsu High Technol Res Key Lab Wireless Sensor Nanjing 210003 Jiangsu Peoples R China

ISBN: (纸本)9781538694039

With the rapid development of Internet and the continuous rise of network users, the network traffic in various regions is increasing rapidly. In the face of a large number of high speed and high throughput of the network environment, traditional packet capture methods and processing capabilities cannot reach the corresponding speed, which results in severe packet loss. This paper focuses on a high-performance packet acquisition and distribution method to break through the performance bottleneck of universal servers and network cards. This paper studies a packet capture method based on DPDK platform, and uses the processing of hash value in RSS to improve the efficiency of data packet distribution, which realizes the process from performance acquisition to efficiently multi-core parallel processing. This method can effectively reduce packet loss and improve the data packet processing rate. It can also reduce resource waste and network overhead for traffic capture and distribution. Preliminary experiments show that DPDK-based traffic processing has obvious advantages over PF-RING and Netmap in data processing speed.

关键词： DPDK Traffic Processing Distribution Multi-core parallel Processing

来源：评论

学校读者我要写书评

暂无评论

SYNCHRONIZED EXECUTION ON SHARED MEMORY MULTIPROCESSORS

引用

parallel COMPUTING 1988年第1-3期8卷 165-175页

作者： FRANCIS, R MATHIESON, I Department of Computer Science La Trobe University Bundoora 3083 Australia

Threads provides a mechanism for simulating the execution of parallel algorithms on a simplified model of a shared-memory multiprocessor. The algorithms can be expressed in a high-level block-structured language, which supports multiple threads of execution within a common body of program code. Results show an ability to achieve good speedup for small problems using algorithms derived by simple modifications of sequential algorithms. As well, a sibling thread synchronisation feature provides the basis for the synchronous execution of threads. k -parallel algorithms tailored to the machine size and implemented as synchronously executing iterations, can provide near linear speedup as the problem size is increased. The techniques described in this paper seem to promise an effective synchronous execution mode for shared-memory MIMD architectures.

关键词： Concurrent programming language constructs parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel DEPTH 1ST SEARCH .2. ANALYSIS

引用

international JOURNAL OF parallel programming 1987年第6期16卷 501-519页

作者： KUMAR, V RAO, VN UNIV TEXAS DEPT COMP SCIAUSTINTX 78712

This paper presents the analysis of a parallel formulation of depth-first search. At the heart of this parallel formulation is a dynamic work-distribution scheme that divides the work between different processors. The effectiveness of the parallel formulation is strongly influenced by the work-distribution scheme and the target architecture. We introduce the concept of isoefficiency function to characterize the effectiveness of different architectures and work-distribution schemes. Many researchers considered the ring architecture to be quite suitable for parallel depth-first search. Our analytical and experimental results show that hypercube and shared-memory architectures are significantly better. The analysis of previously known work-distribution schemes motivated the design of substantially improved schemes for ring and shared-memory architectures. In particular, we present a work-distribution algorithm that guarantees close to optimal performance on a shared-memory/ω-network-with-message-combining architecture (e.g. RP3). Much of the analysis presented in this paper is applicable to other parallel algorithms in which work is dynamically shared between different processors (e.g., parallel divide-and-conquer algorithms). The concept of isoefficiency is useful in characterizing the scalability of a variety of parallel algorithms.

关键词： depth-first search isoefficiency function parallel algorithm work distribution schemes

来源：评论

学校读者我要写书评

暂无评论

A type system to avoid runtime errors for Multi-ML 21

A type system to avoid runtime errors for Multi-ML

引用

21st IEEE international symposium on parallel and Distributed Computing (ISPDC)

作者： Gava, Frederic Allombert, Victor Tesson, Julien Univ Paris Est UPEC LACL Creteil France

ISBN: (数字)9781665488020

ISBN: (纸本)9781665488020

programming parallel architectures using a hierarchical point of view is becoming today's standard as machines are structured by multiple layers of memories. To handle such architectures, we focus on the MULTI-BSP bridging model. This model extends BSP and proposes a structured way of programming multi-level architectures. In the context of parallel programming we, now need to manage new concerns such as memory coherency, deadlocks and safe data communications. To do so, we propose a typing system for MULTI-ML, a ML-like programming language based on the MULTI-BSP model. This type system introduces data locality using type annotations and effects to be able to detected wrong uses of multi-level architectures. We thus ensure that "Well-typed programs cannot go wrong" on hierarchical architectures.

关键词： Type safety parallel programming MULTI-BSP

来源：评论

学校读者我要写书评

暂无评论

A robust and scalable multi-level domain decomposition preconditioner for multi-core architecture with large number of cores

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2020年 373卷

作者： Gratien, Jean-Marc Comp Sci Dept 1 & 4 Bois Preau F-92500 Rueil Malmaison France

With the evolution of High Performance Computing, multi-core and many-core systems are a common feature of new hardware architectures. The required programming efforts induced by the introduction of these architectures are challenging due to the increasing number of cores. parallel programming models based on the data flow model and the task programming paradigm intend to fix this issue. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) fail to scale on shared-memory architectures with large number of cores. Multi-level domain decomposition (DDML) preconditioners recently introduced seem to be both numerically robust and scalable on emerging architectures because of their parallel nature. This paper proposes a parallel implementation of these preconditioners using the task programming paradigm with a data flow model. This approach is validated on linear systems extracted from realistic petroleum reservoir simulations. This shows that, given an appropriate coarse operator in such preconditioners, the method has good convergence rates while our implementation ensures interesting scalability on multi-core architectures. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Domain decomposition methods Linear algebra parallel computing Runtime systems Multi-core architecture Reservoir simulation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：