检索结果-内蒙古大学图书馆

2023 ieee international parallel and distributed processing symposium Workshops, IPDPSW 2023

作者： Kang, Seunghwa Hastings, Chuck Eaton, Joe Rees, Brad NVIDIA United States

ISBN: (纸本)9798350311990

Software development of high-performance graph algorithms is difficult on modern parallel computers. To simplify this task, we have designed and implemented a collection of C++ graph primitives, basic building blocks, within cuGraph to assist graph analytics software developers on parallel computers, ranging from desktops to large clusters. This graph primitives API provides a vertex/edge-centric C++ Standard Template Library (STL)-like interface, allowing users to pick a primitive algorithm, and specify desired operations on vertices and edges and how to reduce the output of such operations through C++ functors. The API implementation is responsible for executing these functors on the underlying hardware. In this case, the graph primitives are implemented to run on NVIDIA GPU systems, from a single-GPU to multi-GPUs in a distributed cluster. RAPIDS cuGraph is NVIDIA's graph analytics solution for data scientists and software integrators. Algorithms in cuGraph are either implemented using the cuGraph C++ primitives API or being migrated over to using the primitives API. The Louvain and PageRank algorithms have been tested on clusters with over 1000 GPUs. © 2023 ieee.

关键词： C++ (programming language)

来源：评论

学校读者我要写书评

暂无评论

RAPID: An end-system aware protocol for intelligent data transfer over lambda grids 20

RAPID: An end-system aware protocol for intelligent data tra...

引用

20th ieee international parallel and distributed processing symposium, IPDPS 2006

作者： Banerjee, Amitabha Feng, Wu-Chun Mukherjee, Biswanath Ghosal, Dipak University of California Davis Dept. of Computer Science Davis CA 95616 United States Virginia Tech. Dept. of Computer Science Blacksburg VA 24061 United States

ISBN: (纸本)1424400546

Next-generation e-Science applications will require the ability to transfer information at high data rates between distributed computing centers and data repositories. To support such applications, lambda grid networks have been built to provide large, on-demand bandwidth between end-points that are interconnected via optical circuit-switched lambdas. It is extremely important to develop an efficient transport protocol over such high-capacity, dedicated circuits. Because lambdas provide dedicated bandwidth between endpoints, they obviate the need for network congestion control. Consequently, past research has demonstrated that rate-based transport protocols, such as RBUDP, are more effective than TCP in transferring data over lambdas. However, while lambdas eliminate congestion in the network, they ultimately push the congestion to the endpoints - congestion that current rate-based transport protocols are ill-suited to handle. In this paper we introduce a "Rate-Adaptive, Protocol for Intelligent Delivery (RAPID)" of data that is lightweight and end-system performance-aware, so as to maximize end-to-end throughput while minimizing packet loss. Based on self monitoring of the dynamic task-priority at the receiving end-system, our protocol enables the receiver to proactively deliver feedback to the sender, so that the sender may adapt its sending rate to avoid congestion at the receiving end-system. This avoids large bursts of packet losses typically observed in current rate-based transport protocols. Over a 10-Gigabit link emulation of an optical circuit, RAPID reduces file-transfer time, and hence improves end-to-end throughput by as much as 25%. © 2006 ieee.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

High-level directives to drive the allocation of parallel object-oriented applications

High-level directives to drive the allocation of parallel ob...

引用

2nd international Workshop on High-Level parallel Programming Models and Supportive Environments / 11th international parallel processing symposium

作者： Corradi, A Leonardi, L Zambonelli, F Universita di Bologna Bologna Italy

ISBN: (纸本)0818678836

The paper presents the Abstract Configuration Language (ACL) implemented within the parallel Objects object-oriented parallel programming environment. ACL defines a set of directives that allow users to specify the allocation needs of his/her application components without being aware of the architectural details. ACL directives drive the allocation decisions of the run-time support, by adapting its general-purpose behaviour to follow applications particular allocation needs. The effectiveness of the ACL approach in increasing the performances of parallel applications is confirmed by a testbed application.

关键词： Object oriented programming

来源：评论

学校读者我要写书评

暂无评论

Communicating While Computing [distributed mobile cloud computing over 5G heterogeneous networks]

引用

ieee SIGNAL processing MAGAZINE 2014年第6期31卷 45-55页

作者： Barbarossa, Sergio Sardellitti, Stefania Di Lorenzo, Paolo Univ Roma La Sapienza Rome Italy IEEE New York NY USA

Current estimates of mobile data traffic in the years to come foresee a 1,000 increase of mobile data traffic in 2020 with respect to 2010, or, equivalently, a doubling of mobile data traffic every year. This unprecedented growth demands a significant increase of wireless network capacity. Even if the current evolution of fourth-generation (4G) systems and, in particular, the advancements of the long-term evolution (LTE) standardization process foresees a significant capacity improvement with respect to third-generation (3G) systems, the European Telecommunications Standards Institute (ETSI) has established a roadmap toward the fifth-generation (5G) system, with the aim of deploying a commercial system by the year 2020 [1]. The European Project named ?Mobile and Wireless Communications Enablers for the 2020 Information Society? (METIS), launched in 2012, represents one of the first international and large-scale research projects on fifth generation (5G) [2]. In parallel with this unparalleled growth of data traffic, our everyday life experience shows an increasing habit to run a plethora of applications specifically devised for mobile devices, (smartphones, tablets, laptops)for entertainment, health care, business, social networking, traveling, news, etc. However, the spectacular growth in wireless traffic generated by this lifestyle is not matched with a parallel improvement on mobile handsets? batteries, whose lifetime is not improving at the same pace [3]. This determines a widening gap between the energy required to run sophisticated applications and the energy available on the mobile handset. A possible way to overcome this obstacle is to enable the mobile devices, whenever possible and convenient, to offload their most energy-consuming tasks to nearby fixed servers. This strategy has been studied for a long time and is reported in the literature under different names, such as cyberforaging [4] or computation offloading [5], [6]. In recent years, a strong impul

关键词： Distribute Heterogeneous networks Cloud Computing Mobile COMMUNICATING

来源：评论

学校读者我要写书评

暂无评论

An on-line arithmetic-based reconfigurable neuroprocessor 13th

引用

13th international parallel processing symposium, IPPS 1999 Held in Conjunction with the 10th symposium on parallel and distributed processing, SPDP 1999

作者： Beuchat, Jean-Luc Sanchez, Eduardo Logic Systems Laboratory Swiss Federal Institute of Technology LausanneCH-1015 Switzerland

ISBN: (纸本)3540658319

Artificial neural networks can solve complex problems such as time series prediction, handwritten pattern recognition or speech processing. Though software simulations are essential when one sets about to study a new algorithm, they cannot always fulfill real-time criteria required by some practical applications. Consequently, hardware implementations are of crucial import. The appearance of fast reconfigurable FPGA circuits brings about new paths for the design of neuroprocessors. All arithmetic operations are carried out with on-line operators. This short paper briefly describes reconfigurable FPGA-based neural networks and gives an introduction to on-line arithmetic. © Springer-Verlag Berlin Heidelberg 1999.

关键词： Speech processing

来源：评论

学校读者我要写书评

暂无评论

Proceedings of 7th international parallel processing symposium, IPPS 1993

Proceedings of 7th International Parallel Processing Symposi...

引用

7th international parallel processing symposium, IPPS 1993

ISBN: (纸本)0818634421

The proceedings contain 128 papers. The topics discussed include: C parallelizing compiler on local-net work- based computer environment;OCCAM prototyping of massively parallel applications from colored Petri-nets;performance characteristics of the iPSC/SSO and CM-2 I/O systems;automatic parallelization of LINPACK routines on distributed memory parallel processors;transformation of doacross loops on distributed memory systems;an efficient atomic multicast protocol for client-server models;a new horizon for sorting on mesh architectures;mapping of uniform dependence algorithm onto fixed size processor arrays;and towards understanding block partitioning for sparse Cholesky factorization.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Measuring RPC traffic in an OS/2 DCE environment

Measuring RPC traffic in an OS/2 DCE environment

引用

5th international symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

作者： Sun, Y Bunt, R Oster, G Univ of Saskatchewan Saskatoon Canada

ISBN: (纸本)0818677589

distributed computing involves systems that operate across networks transparently, using the resources of multiple machines. The Open Software Foundation's distributed Computing Environment (DCE) has evolved to address the need for a vendor-neutral platform to which distributed applications can be developed, and upon which they can run. Central to the design philosophy of DCE is its reliance on the Remote Procedure Call (RPC) to facilitate communication among the entities in the distributed environment. Since it profoundly affects the performance of both the DCE environment and applications running on top of it, the performance of RPCs is very much a concern of both application developers and system managers in a DCE installation This short paper reports some results from an ongoing empirical investigation of the OS/2 DCE RPC facility. Our interest in this project is the effect on end-to-end RPC performance of protocol processing, flow control mechanisms within DCE, other load on the network, and interoperation with multiple DCE platforms.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

parallel IMPLEMENTATION OF VERTEX COMPONENT ANALYSIS FOR HYPERSPECTRAL ENDMEMBER EXTRACTION

PARALLEL IMPLEMENTATION OF VERTEX COMPONENT ANALYSIS FOR HYP...

引用

ieee international Geoscience and Remote Sensing symposium (IGARSS)

作者： Rodriguez Alves, Jose M. Nascimento, Jose M. P. Bioucas-Dias, Jose M. Silva, Vitor Plaza, Antonio Inst Telecomunicacoes Lisbon Portugal

ISBN: (纸本)9781467311595

Vertex component analysis (VCA) has become a very popular and useful tool to linear unmix large hyperspectral datasets without the use of any a priori knowledge of the constituent spectra. Although VCA is fast method, many hyperspectral imagery applications require a response in real time or near-real time. This paper proposes two different optimizations for accelerating the computational performance of VCA: the first one focus a parallel implementation based on graphics computing units (GPUs) to alleviate the VCA computational burden;The second one is focused on the development of a strategy to remove a large proportion of mixed pixels that play no effect on the VCA functioning. Experiments are conducted using simulated and real hyperspectral datasets. These results reveal considerable acceleration factors, which satisfies the real-time constraints given by the data acquisition rate.

关键词： Hyperspectral Unmixing Endmember Extraction Vertex Component Analysis Graphics processing Unit parallel Methods

来源：评论

学校读者我要写书评

暂无评论

parallel hypergraph partitioning for scientific computing 20

Parallel hypergraph partitioning for scientific computing

引用

20th ieee international parallel and distributed processing symposium, IPDPS 2006

作者： Devine, Karen D. Boman, Erik G. Heaphy, Robert T. Bisseling, Rob H. Catalyurek, Umit V. Sandia National Laboratories Dept. of Discrete Algorithms and Math. Albuquerque NM 87185-1111 United States Utrecht University Dept. of Mathemathics 3508 TA Utrecht Netherlands Ohio State University Dept. of Biomedical Informatics Columbus OH 43210 United States

ISBN: (纸本)1424400546

Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster. © 2006 ieee.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel solution to the cutting stock problem for a cluster of workstations

Parallel solution to the cutting stock problem for a cluster...

引用

Proceedings of the 1996 5th ieee international symposium on High Performance distributed Computing

作者： Nicklas, Lisa D. Atkins, Robert W. Setia, Sanjeev K. Wang, Pearl Y. George Mason Univ Fairfax United States

ISBN: (纸本)0818675829

This paper describes the design and implementation of a solution to the constrained 2-D cutting stock problem on a cluster of workstations. The constrained 2-D cutting stock problem is an irregular problem with a dynamically modified global data set and irregular amounts and patterns of communication. A replicated data structure is used for the parallel solution since the ratio of reads to writes is known to be large. Mutual exclusion and consistency are maintained using a token-based lazy consistency mechanism, and a randomized protocol for dynamically balancing the distributed work queue is employed. Speedups are reported for three benchmark problems executed on a cluster of workstations interconnected by a 10 Mbps Ethernet.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：