检索结果-内蒙古大学图书馆

International Symposium on parallel Processing

作者： S.S. Lumetta D.E. Culler Computer Science Division University of California Berkeley USA

Passing messages through shared memory plays an important role in symmetric multiprocessors and on Clumps. The management of concurrent access to message queues is an important aspect of design for shared memory message passing systems. Using both microbenchmarks and applications, the paper compares the performance of concurrent access algorithms for passing active messages on a Sun Enterprise 5000 server. The paper presents a new lock free algorithm that provides many of the advantages of non blocking algorithms while avoiding the overhead of true non blocking behavior. The lock free algorithm couples synchronization tightly to the data structure and demonstrates application performance superior to all others studied. The success of this algorithm implies that other practical problems might also benefit from a reexamination of the non blocking literature.

关键词： Memory management Data structures Sun parallel programming Computer science Access protocols System testing File servers Operating systems Delay effects

来源：评论

学校读者我要写书评

暂无评论

Tailoring a self-distributing architecture to a cluster computer environment

Tailoring a self-distributing architecture to a cluster comp...

引用

Euromicro Workshop on parallel and Distributed Processing

作者： R. Moore B. Klauer K. Waldschmidt Technische Informatics Johann Wolfgang Goethe-University of Frankfurt Frankfurt Germany Goethe-Universitat Frankfurt am Main Frankfurt am Main Hessen DE

This paper analyzes the consequences of existing network structure for the design of a protocol for a radical COMA (Cache Only Memory Architecture). parallel computing today faces two significant challenges: the difficulty of programming and the need to leverage existing "off-the-shelf" hardware. The difficulty of programming parallel computers can be split into two problems: distributing the data, and distributing the computation. parallelizing compilers address both problems, but have limited application outside the domain of loop intensive "scientific" code. Conventional COMAs provide an adaptive, self-distributing solution to data distribution, but do not address computation distribution. Our proposal leverages parallelizing compilers, and then extends COMA to provide adaptive self-distribution of both data and computation. The radical COMA protocols can be implemented in hardware, software, or a combination of both. When, however, the implementation is constrained to operate in a cluster computing environment (that is, to use only existing, already installed hardware), the protocols have to be reengineered to accommodate the deficiencies of the hardware. This paper identifies the critical quantities of various existing network structures, and discusses their repercussions for protocol design. A new protocol is presented in detail.

关键词： Computer architecture Protocols Hardware Concurrent computing Distributed computing parallel programming Memory architecture parallel processing Application software Proposals

来源：评论

学校读者我要写书评

暂无评论

Automatic parallelization of Scripting Languages: Toward Transparent Desktop parallel Computing

Automatic Parallelization of Scripting Languages: Toward Tra...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Xiaosong Ma Jiangtian Li Nagiza F. Samatova Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge TN USA Department of Computer Engineering North Carolina State University Raleigh NC USA

Desktop computing remains indispensable in scientific exploration, largely because it provides people with devices for human interaction and environments for interactive job execution. However, with today's rapidly growing data volume and task complexity, it is increasingly hard for individual workstations to meet the demands of interactive scientific data processing. The increasing cost of such interactive processing is hindering the productivity of end-to-end scientific computing workflows. While existing distributed computing systems allow people to aggregate desktop workstation resources for parallel computing, the burden of explicit parallel programming and parallel job execution often prohibits scientists to take advantage of such platforms. In this paper, we discuss the need for transparent desktop parallel computing in scientific data processing. As an initial step toward this goal, we present our on-going work on the automatic parallelization of the scripting language R, a popular tool for statistical computing. Our preliminary results suggest that a reasonable speedup can be achieved on real-world sequential R programs without requiring any code modification.

关键词： parallel processing Workstations Data processing Humans Costs Productivity Scientific computing Distributed computing Aggregates parallel programming

来源：评论

学校读者我要写书评

暂无评论

Executable Modelling for Highly parallel Accelerators

Executable Modelling for Highly Parallel Accelerators

引用

ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion

作者： Lorenzo Addazi Federico Ciccozzi Bjorn Lisper School of Innovation Design and Engineering Malardalen University

ISBN: (纸本)9781728151267

High-performance embedded computing is developing rapidly since applications in most domains require a large and increasing amount of computing power. On the hardware side, this requirement is met by the introduction of heterogeneous systems, with highly parallel accelerators that are designed to take care of the computation-heavy parts of an application. There is today a plethora of accelerator architectures, including GPUs, many-cores, FPGAs, and domain-specific architectures such as AI accelerators. They all have their own programming models, which are typically complex, low-level, and involve explicit parallelism. This yields error-prone software that puts the functional safety at risk, unacceptable for safety-critical embedded applications. In this position paper we argue that high-level executable modelling languages tailored for parallel computing can help in the software design for high performance embedded applications. In particular, we consider the data-parallel model to be a suitable candidate, since it allows very abstract parallel algorithm specifications free from race conditions. Moreover, we promote the Action Language for fUML (and thereby fUML) as suitable host language.

关键词： parallel programming fUML Alf UML Modelling languages High-performance computing Data-parallelism Executable models High Performance Computing parallel programming modelling languages UML data parallel computational power Accelerators GTF2A1L gene Software design Accelerator architectures embedded application Position Papers HETEROGENEOUS SYSTEM

来源：评论

学校读者我要写书评

暂无评论

CUDA toolkit and libraries

CUDA toolkit and libraries

引用

IEEE Hot Chips Symposium (HCS)

作者： Massimiliano Fatica Time and Frequency Division National Institute of Standards and Technology (NIST) Boulder Colorado U.S.A

Presents a collection of slides covering the following: NVIDIA CUDA; CUDA toolkit; CUDA libraries; closely coupled CPU-GPU; CUDA many-core and multi-core support; nvcc CUDA compiler; CUBLAS; and CUFFT.

关键词： Graphics processing units Tutorials Multicore processing Libraries parallel programming Data transfer parallel processing

来源：评论

学校读者我要写书评

暂无评论

Gauss elimination: a case study on parallel machines

Gauss elimination: a case study on parallel machines

引用

IEEE Compcon

作者： K.H. Warren E.D. Brooks Massively Parallel Computing Initiative Lawrence Livemore National Laboratory Livermore CA USA

The authors report their experiences with the Gauss elimination algorithm on several parallel machines. Several different software designs are demonstrated, ranging from a simple shared memory implementation to the use of a message passing programming model. It is found that the efficient use of local memory is critical to obtaining good performance on scalable machines. Machines with large coherent caches appear to require the least software effort in order to obtain effective performance.< >

关键词： Gaussian processes Computer aided software engineering parallel machines Message passing Laboratories Software performance Algorithm design and analysis parallel processing Software design parallel programming

来源：评论

学校读者我要写书评

暂无评论

Trends in compilable DSP architecture

Trends in compilable DSP architecture

引用

IEEE Workshop on Signal Processing Systems (SIPS)

作者： J. Glossner J. Moreno M. Moudgill J. Derby E. Hokenek D. Meltzer U. Shvadron M. Ware IBM Communications Research and Development Center Yorktown Heights NY USA

We review the evolution of DSP architectures and compiler technology, and describe how compiler techniques are being used to optimize emerging DSP architectures. Such new architectures are characterized by the exploitation of data and instruction level parallelism while being an amenable target for a compiler, thereby reducing or eliminating the need to rely on assembly language programming and/or architecture-specific compiler intrinsics to achieve highly efficient code. We also summarize our research results on an ultra low power compilable DSP architecture.

关键词： Digital signal processing Assembly High level languages Kernel Program processors Digital signal processors Communication standards Research and development Optimizing compilers parallel programming

来源：评论

学校读者我要写书评

暂无评论

Experiments of a reconfigurable multiprocessor simulation on a distributed environment

Experiments of a reconfigurable multiprocessor simulation on...

引用

International Phoenix Conference on Computers and Communications (IPCCC)

作者： B.O. Apduhan T. Sueyoshi Y. Namiuchi T. Tezuka I. Arita Department of Artificial Intelligence Kyushu Institute of Technology Japan

The experiments and analysis of a reconfigurable multiprocessor simulation on a cluster of workstations connected by Ethernet are presented. The system model and simulation environment is described. The monitoring/debugging tool and the concept of SPP, a proposed parallel programming paradigm which can effectively reduce the synchronization operations, are described. The structure of the modules comprised by the system software model are also described. The sequential and parallel versions of a computationally intensive sequential program were executed on different network topologies and its speedup ratios are analyzed and discussed. The crucial issues in realizing reconfigurable multiprocessor simulation on a distributed environment are considered.< >

关键词： Computational modeling Analytical models Workstations Ethernet networks Monitoring Debugging parallel programming System software Computer networks Concurrent computing

来源：评论

学校读者我要写书评

暂无评论

Software Engineering for parallel Processing

Software Engineering for Parallel Processing

引用

IEE Colloquium on High Performance Computing for Advanced Control

作者： S.C. Winter P. Kacsuk Centre for Parallel Comput. Westminster Univ. London UK Centre for Parallel Computing Budapest Hungary

Describes the architecture of a development environment for computer-aided parallel software engineering. The environment comprises tools for program design, simulation, run-time support and behaviour analysis. Tools are invariably interactive, depending in large part on graphical and visualisation support. SEPP (Software Engineering for parallel Processing) is an EU-funded consortium of nine partners in Eastern and Western Europe, whose aim is to realise the architecture through the development of practical tools.< >

关键词： Computer aided software engineering parallel programming programming environments Software tools

来源：评论

学校读者我要写书评

暂无评论

Automatic runtime calculation of communications for data-parallel expressions with periodic conditions

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2019年第5期31卷

作者： Moreton-Fernandez, Ana Gonzalez-Escribano, Arturo Univ Valladolid Dept Informat Valladolid Spain

Many real-world applications feature data accesses on periodic domains. Manually implementing the synchronizations and communications associated to the data dependences on each case is cumbersome and error-prone. It is increasingly interesting to support these applications in high-level parallel programming languages or parallelizing compilers. In this paper, we present a technique that, for distributed-memory systems, calculates the specific communications derived from data-parallel codes with or without periodic boundary conditions on affine access expressions. It makes transparent to the programmer the management of aggregated communications for the chosen data partition. Our technique moves to runtime part of the compile-time analysis typically used to generate the communication code for affine expressions, introducing a complete new technique that also supports the periodic boundary conditions. We present an experimental study to evaluate our proposal using several study cases. Our experimental results show that our approach can automatically obtain communication codes as efficient as those found in MPI reference codes, reducing the development effort.

关键词： communications distributed memory parallel programming periodic boundary condition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：