检索结果-内蒙古大学图书馆

An asynchronous proximal bundle method

MATHEMATICAL programming 2025年第1-2期209卷 825-857页

作者： Fischer, Frank Johannes Gutenberg Univ Mainz Inst Comp Sci Mainz Germany

We develop a fully asynchronous proximal bundle method for solving non-smooth, convex optimization problems. The algorithm can be used as a drop-in replacement for classic bundle methods, i.e., the function must be given by a first-order oracle for computing function values and subgradients. The algorithm allows for an arbitrary number of master problem processes computing new candidate points and oracle processes evaluating functions at those candidate points. These processes share information by communication with a single supervisor process that resembles the main loop of a classic bundle method. All processes run in parallel and no explicit synchronization step is required. Instead, the asynchronous and possibly outdated results of the oracle computations can be seen as an inexact function oracle. Hence, we show the convergence of our method under weak assumptions very similar to inexact and incremental bundle methods. In particular, we show how the algorithm learns important structural properties of the functions to control the inaccuracy induced by the asynchronicity automatically such that overall convergence can be guaranteed.

关键词： Proximal bundle methods parallel programming Convex optimization Asynchronous algorithms

来源：评论

学校读者我要写书评

暂无评论

A General-purpose parallel and Heterogeneous Task programming System for VLSI CAD 20

A General-purpose Parallel and Heterogeneous Task Programmin...

引用

39th IEEE/ACM International Conference On Computer Aided Design (ICCAD)

作者： Huang, Tsung-Wei Univ Utah Dept ECE Salt Lake City UT 84112 USA

ISBN: (纸本)9781665423243

This paper introduces Taskflow to address the critical question of "How can we make it easier to implement and deploy parallel computer-aided design (CAD) algorithms on large heterogeneous nodes with high performance and simultaneous high productivity?" parallelizing CAD is an extremely challenging job. Modern CAD applications exhibit unique computational patterns and user requirements that need very strategic decomposition to benefit from parallelism. Taskflow assists researchers and developers in the implementation complexity of parallel algorithms by introducing a new high-level programming model supported by an efficient run-time. By capitalizing on emerging parallelism comprising many-core central processing units (CPUs), graphics processing units (GPUs), and custom accelerators, Taskflow enables CAD to achieve new performance and productivity milestones that were previously out of reach.

关键词： parallel programming computer-aided design

来源：评论

学校读者我要写书评

暂无评论

PragFormer: Data-Driven parallel Source Code Classification with Transformers

引用

INTERNATIONAL JOURNAL OF parallel programming 2025年第1期53卷 1-26页

作者： Harel, Re'em Kadosh, Tal Hasabnis, Niranjan Mattson, Timothy Pinter, Yuval Oren, Gal Bengurion Univ Beer Sheva Israel NRCN Beer Sheva Israel IAEC Tel Aviv Israel Intel Labs Hillsboro OR USA Technion Haifa Israel

Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant. Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for Pra

关键词： parallel programming Artificial intelligence Software development programming assistance

来源：评论

学校读者我要写书评

暂无评论

A Three-Semester, Interdisciplinary Approach to parallel programming in a Liberal Arts University Setting 14

A Three-Semester, Interdisciplinary Approach to Parallel Pro...

引用

2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014

作者： Morris, Mike Frinkle, Karl Department of Computer Science Southeastern Oklahoma State University Durant OK USA United States Department of Mathematics Southeastern Oklahoma State University Durant OK USA United States

ISBN: (纸本)9781450328937

We describe a successful addition of high performance computing (HPC) into a traditional computer science curriculum at a liberal arts university. The approach incorporated a three-semester sequence of courses emphasizing parallel programming techniques, with the final course focusing on a research-level mathematical project that was executed on a TOP500 supercomputer. A group of students with varied programming backgrounds participated in the program. Emphasis was placed on utilizing the Open MPI and CUDA libraries along with parallel algorithm and file I/O analysis. Copyright 2014 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Boundary-Aware Concurrent Queue: A Fast and Scalable Concurrent FIFO Queue on GPU Environments

引用

APPLIED SCIENCES-BASEL 2025年第4期15卷 1834-1834页

作者： Polak, Md. Sabbir Hossain Troendle, David A. Jang, Byunghyun Univ Mississippi Comp & Informat Sci 201 Weir Hall Oxford MS 38677 USA

This paper presents Boundary-Aware Concurrent Queue (BACQ), a high-performance queue designed for modern GPUs, which focuses on high concurrency in massively parallel environments. BACQ operates at the warp level, leveraging intra-warp locality to improve throughput. A key to BACQ's design is its ability to replace conflicting accesses to shared data with independent accesses to private data. It uses a ticket-based system to ensure fair ordering of operations and supports infinite growth of the head and tail across its ring buffer. The leader thread of each warp coordinates enqueue and dequeue operations, broadcasting offsets for intra-warp synchronization. BACQ dynamically adjusts operation priorities based on the queue's state, especially as it approaches boundary conditions such as overfilling the buffer. It also uses a virtual caching layer for intra-warp communication, reducing memory latency. Rigorous benchmarking results show that BACQ outperforms the BWD (Broker Queue Work Distributor), the fastest known GPU queue, by more than 2x while preserving FIFO semantics. The paper demonstrates BACQ's superior performance through real-world empirical evaluations.

关键词： GPGPU concurrent queues concurrent data structures concurrent programming parallel programming strict First-In-First-Out (FIFO)

来源：评论

学校读者我要写书评

暂无评论

Approach class library of high level parallel compositions to implements communication patterns using structured parallel programming 26

Approach class library of high level parallel compositions t...

引用

26th European Modeling and Simulation Symposium, EMSS 2014

作者： Rossainz-López, M. Capel-Tuñón, M.I. Universidad Autónoma de Puebla Avenida. San Claudio y 14 Sur San Manuel Puebla State of Puebla72000 Mexico Departamento de Lenguajes y Sistemas Informáticos ETS Ingeniería Informática y de Telecomunicación Universidad de Granada Periodista Daniel Saucedo Aranda s/n Granada18071 Spain

ISBN: (纸本)9788897999324

This article presents through an environment of parallel Objects, an approach to Structured parallel programming and the Object-Orientation paradigm, a programming methodology based on High Level parallel Compositions (HLPC). By means of the method application, the parallelization of commonly used communication patterns among processes is presented, which is initially constituted by the HLPCs Farm, Pipe and TreeDV that represent, respectively, the patterns of communication Farm, Pipeline and Binary Tree, the latter one used within a parallel version of the design technique known as Divide and Conquer.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimizing maximum shared risk link group disjoint path algorithm using NVIDIA CUDA heterogeneous parallel programming platform 10

Optimizing maximum shared risk link group disjoint path algo...

引用

10th International Symposium on Telecommunications, BIHTEL 2014

作者： Miletic, Vedran Subic, Tomislav Mikac, Branko University of Rijeka Department of Informatics Radmile Matejčić 2 Rijeka51000 Croatia University of Zagreb Faculty of Electrical Engineering and Computing Unska 3 Zagreb10000 Croatia

ISBN: (纸本)9781479941360

Network availability is an essential feature of an optical telecommunication network. Should a failure of a network component occur, be it a link or a component inside a node, network control plane must be able to detect the failure and reroute the traffic using spare components until a repair is done. Shared risk link groups (SRLGs) are used to describe a situation where seemingly unrelated logical failures happen due to a single physical failure. For example, two or more links might share a bridge crossing;should a failure happen, all of them will be damaged. Routing algorithms were proposed to ensure working and spare paths of a connection in a network are SRLG-disjoint to avoid such common cause failures. However, complete SRLG-disjointness of working and spare path is not always possible due to limited number of links or limited capacity available in the network, so maximum SRLG-disjoint paths algorithm is taken instead. Maximum SRLG-disjoint path problem is in general NP-hard. In terms of solution quality greedy algorithms for maximum SRLG-disjoint path problem are as good as more complicated heuristics. To improve the performance of maximum SRLG-disjoint path greedy algorithm, it was implemented using NVIDIA CUDA heterogeneous parallel programming platform and executed on graphics processing unit. The implementation of maximum SRLG-disjoint path algorithm on GPU increases performance significantly compared to implementation utilizing only CPU, especially in simulations of large networks. © 2014 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

InteropUnityCUDA: A Tool for Interoperability Between Unity and CUDA

引用

SOFTWARE-PRACTICE & EXPERIENCE 2025年第6期55卷 1127-1141页

作者： Algis, David Bramas, Berenger Darles, Emmanuelle Aveneau, Lilian Univ Poitiers XLIM Poitiers France Studio Nyx Gond Pontouvre France INRIA Nancy Grand Est ICube Nancy France

IntroductionUnity is a powerful and versatile tool for creating real-time experiments. It includes a built-in compute shader language, a C-like programming language designed for massively parallel General-Purpose GPU (GPGPU) computing. However, as Unity is primarily developed for multi-platform game creation, its compute shader language has several limitations, including the lack of multi-GPU computation support and incomplete mathematical *** address these limitations, GPU manufacturers have developed specialized programming models, such as CUDA and HIP, which enable developers to leverage the full computational power of modern GPUs. This article introduces an open-source tool designed to bridge the gap between Unity and CUDA, allowing developers to integrate CUDA's capabilities within Unity-based *** proposed solution establishes an interoperability framework that facilitates communication between Unity and CUDA. The tool is designed to efficiently transfer data, execute CUDA kernels, and retrieve results, ensuring seamless integration into Unity's rendering and computation *** tool extends Unity's capabilities by enabling CUDA-based computations, overcoming the inherent limitations of Unity's compute shader language. This integration allows developers to exploit multi-GPU architectures, leverage advanced mathematical functions, and enhance computational performance for real-time applications.

关键词： CUDA interoperability parallel programming programming techniques real-time systems software tools unity

来源：评论

学校读者我要写书评

暂无评论

Freeze after writing quasi-deterministic parallel programming with LVars

Freeze after writing quasi-deterministic parallel programmin...

引用

作者： Kuper, Lindsey Turon, Aaron Krishnaswami, Neelakantan R. Newton, Ryan R. Indiana University United States MPI-SWS Germany University of Birmingham United Kingdom

Deterministic-by-construction parallel programming models offer the advantages of parallel speedup while avoiding the nondeterministic, hard-to-reproduce bugs that plague fully concurrent code. A principled approach to deterministic-by-construction parallel programming with shared state is offered by LVars: shared memory locations whose semantics are defined in terms of an applicationspecific lattice. Writes to an LVar take the least upper bound of the old and new values with respect to the lattice, while reads from an LVar can observe only that its contents have crossed a specified threshold in the lattice. Although it guarantees determinism, this interface is quite limited. We extend LVars in two ways. First, we add the ability to freeze and then read the contents of an LVar directly. Second, we add the ability to attach event handlers to an LVar, triggering a callback when the LVar's value changes. Together, handlers and freezing enable an expressive and useful style of parallel programming. We prove that in a language where communication takes place through these extended LVars, programs are at worst quasideterministic: on every run, they either produce the same answer or raise an error. We demonstrate the viability of our approach by implementing a library for Haskell supporting a variety of LVarbased data structures, together with a case study that illustrates the programming model and yields promising parallel speedup.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Self-adaptive parallel programming through tunable concurrency

Self-adaptive parallel programming through tunable concurren...

引用

2014 ACM SIGPLAN Conference on Systems, programming, and Applications: Software for Humanity, SPLASH 2014

作者： Nguyen, Tai Zhao, Xinghui School of Engineering and Computer Science Washington State University United States

ISBN: (纸本)9781450332088

Recent advances in hardware architectures, particularly multicore and manycore systems, implicitly require programmers to write concurrent programs. However, writing correct and efficient concurrent programs is challenging. We envision a system where the concurrent programs can be self-adaptive when executing on different hardware. We have developed two different tuning policies, which enable users' programs to adjust their level of concurrency at compiletime and run-time respectively. Copyright is held by the owner/author(s).

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：