检索结果-内蒙古大学图书馆

38th ACM SIGSIM International Conference on Principles of Advanced Discrete Simulation (ACM SIGSIM PADS)

作者： Piccione, Andrea Pellegrini, Alessandro Huawei Munich Res Ctr Munich Germany Tor Vergata Univ Rome Rome Italy

ISBN: (纸本)9798400703638

Parallel Discrete Event Simulation (PDES) is a modelling technique that takes advantage of concurrent computing resources. However, its asynchronous nature can present challenges for efficient execution. This paper proposes a new non-blocking management system for handling messages and anti-messages in Time Warp simulations. This approach exploits the benefits of non-blocking algorithms to surpass the limitations of existing blocking mechanisms, resulting in more efficient and scalable simulations. Specifically, the approach relies on efficient atomic fetch-and-add operations provided by modern computer architectures for evaluating and updating the status of the event.

关键词： Time Warp Event Set non-blocking algorithms Synchronization

来源：评论

学校读者我要写书评

暂无评论

NBBS: A non-blocking Buddy System for Multi-Core Machines

引用

IEEE TRANSACTIONS ON COMPUTERS 2022年第3期71卷 599-612页

作者： Marotta, Romolo Ianni, Mauro Pellegrini, Alessandro Quaglia, Francesco Univ Roma La Sapienza I-00185 Rome Italy Lockless Srl I-00185 Rome Italy Univ Roma Tor Vergata I-00133 Rome Italy

Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators-the bottom-most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article, we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy system, where threads performing concurrent allocations/releases do not undergo any spin-lock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling the allocator metadata-memory fragmentation and coalescing are also carried out in a fully non-blocking manner. Conflict detection relies in our solution on atomic Read-Modify-Write (RMW) machine instructions, guaranteed to execute atomically by the processor firmware. We also provide a proof of the correctness of our non-blocking buddy system and show the results of an experimental study that outlines the effectiveness of our solution.

关键词： Instruction sets Resource management Memory management Proposals Scalability Message systems Data structures non-blocking algorithms shared-memory algorithms back-end memory allocation buddy system

来源：评论

学校读者我要写书评

暂无评论

PathCAS: An Efficient Middle Ground for Concurrent Search Data Structures 22

PathCAS: An Efficient Middle Ground for Concurrent Search Da...

引用

27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)

作者： Brown, Trevor Sigouin, William Alistarh, Dan Univ Waterloo Canada Waterloo ON Canada

ISBN: (纸本)9781450392044

To maximize the performance of concurrent data structures, researchers have often turned to highly complex fine-grained techniques, resulting in efficient an d elegant algorithms, which can however be often difficult to understand and prove correct. While simpler techniques exist, such as transactional memory, they can have limited performance or portability relative to their fine-grained counterparts. Approaches at both ends of this complexity-performance spectrum have been extensively explored, but relatively less is known about the middle ground: approaches that are willing to sacrifice some performance for simplicity, while remaining competitive with state-of-the-art handcrafted designs. In this paper, we explore this middle ground, and present PathCAS, a primitive that combines ideas from multi-word CAS (KCAS) and transactional memory approaches, while carefully avoiding overhead. We show how PathCAS can be used to implement efficient search data structures relatively simply, using an internal binary search tree as an example, then extending this to an AVL tree. Our best implementations outperform many handcrafted search trees: in search-heavy workloads, it rivals the BCCO tree [5], the fastest known concurrent binary tree in terms of search performance [3]. Our results suggest that PathCAS can yield concurrent data structures that are relatively easy to build and prove correct, while offering surprisingly high performance.

关键词： concurrent data structures search trees non-blocking algorithms lock-free synchronization primitives

来源：评论

学校读者我要写书评

暂无评论

Information-flow control on ARM and POWER multicore processors

引用

FORMAL METHODS IN SYSTEM DESIGN 2021年第1-2期58卷 251-293页

作者： Smith, Graeme Coughlin, Nicholas Murray, Toby Def Sci & Technol Grp Brisbane Qld Australia Univ Queensland Sch Informat Technol & Elect Engn Brisbane Qld Australia Univ Melbourne Sch Comp & Informat Syst Melbourne Vic Australia

Weak memory models implemented on modern multicore processors are known to affect the correctness of concurrent code. They can also affect whether or not the concurrent code is secure. This is particularly the case in programs where the security levels of variables are value-dependent, i.e., depend on the values of other variables. In this paper, we illustrate how instruction reordering allowed by ARM and POWER multicore processors leads to vulnerabilities in such programs, and present a compositional, flow-sensitive information-flow logic which can be used to detect such vulnerabilities. The logic allows step-local reasoning (one instruction at a time) about a thread's security by tracking information about dependencies between instructions which guarantee their order of occurrence. Program security can then be established from individual thread security using rely/guarantee reasoning. The logic has been proved sound with respect to existing operational semantics using Isabelle/HOL, and implemented in an automatic symbolic execution tool.

关键词： Information-flow security Weak memory models non-blocking algorithms

来源：评论

学校读者我要写书评

暂无评论

non-blocking Interpolation Search Trees with Doubly-Logarithmic Running Time 20

Non-Blocking Interpolation Search Trees with Doubly-Logarith...

引用

25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)

作者： Brown, Trevor Prokopec, Aleksandar Alistarh, Dan Univ Waterloo Waterloo ON Canada Oracle Labs Zurich Switzerland IST Austria Klosterneuburg Austria

ISBN: (纸本)9781450368186

Balanced search trees typically use key comparisons to guide their operations, and achieve logarithmic running time. By relying on numerical properties of the keys, interpolation search achieves lower search complexity and better performance. Although interpolation-based data structures were investigated in the past, their non-blocking concurrent variants have received very little attention so far. In this paper, we propose the first non-blocking implementation of the classic interpolation search tree (IST) data structure. For arbitrary key distributions, the data structure ensures worst-case O(logn + p) amortized time for search, insertion and deletion traversals. When the input key distributions are smooth, lookups run in expected O(log logn + p) time, and insertion and deletion run in expected amortized O(log logn + p) time, where p is a bound on the number of threads. To improve the scalability of concurrent insertion and deletion, we propose a novel parallel rebuilding technique, which should be of independent interest. We evaluate whether the theoretical improvements translate to practice by implementing the concurrent interpolation search tree, and benchmarking it on uniform and non-uniform key distributions, for dataset sizes in the millions to billions of keys. Relative to the state-of-the-art concurrent data structures, the concurrent interpolation search tree achieves performance improvements of up to 15% under high update rates, and of up to 50% under moderate update rates. Further, ISTs exhibit up to 2x less cache-misses, and consume 1.2 - 2.6x less memory compared to the next best alternative on typical dataset sizes. We find that the results are surprisingly robust to distributional skew, which suggests that our data structure can be a promising alternative to classic concurrent search structures.

关键词： concurrent data structures search trees interpolation non-blocking algorithms

来源：评论

学校读者我要写书评

暂无评论

Transparent Speculative Parallelization of Discrete Event Simulation Applications Using Global Variables

引用

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING 2016年第6期44卷 1200-1247页

作者： Pellegrini, Alessandro Peluso, Sebastiano Quaglia, Francesco Vitali, Roberto Univ Roma La Sapienza DIAG Via Ariosto 25 I-00185 Rome Italy Virginia Tech ECE Dept Blacksburg VA 24061 USA

Parallelizing (compute-intensive) discrete event simulation (DES) applications is a classical approach for speeding up their execution and for making very large/complex simulation models tractable. This has been historically achieved via parallel DES (PDES) techniques, which are based on partitioning the simulation model into distinct simulation objects (somehow resembling objects in classical object-oriented programming), whose states are disjoint, which are executed concurrently and rely on explicit event-exchange (or event-scheduling) primitives as the means to support mutual dependencies and notification of their state updates. With this approach, the application developer is necessarily forced to reason about state separation across the objects, thus being not allowed to rely on shared information, such as global variables, within the application code. This implicitly leads to the shift of the user-exposed programming model to one where sequential-style global variable accesses within the application code are not allowed. In this article we remove this limitation by providing support for managing global variables in the context of DES code developed in ANSI-C, which gets automatically parallelized. Particularly, we focus on speculative (also termed optimistic) PDES systems that run on top of multi-core machines, where simulation objects can concurrently process their events with no guarantee of causal consistency and actual violations of causality rules are recovered through rollback/recovery schemes. In compliance with the nature of speculative processing, in our proposal global variables are transparently mapped to multi-versions, so as to avoid any form of safety predicate verification upon their updates. Consistency is ensured via the introduction of a new rollback/recovery scheme based on detecting global variables' reads on non-correct versions. At the same time, efficiency in the execution is guaranteed by managing multi-version variables' lists via non-

关键词： Speculative computing Discrete event applications Parallel simulation Multi-version schemes non-blocking algorithms

来源：评论

学校读者我要写书评

暂无评论

A Lock-Free O(1) Event Pool and its Application to Share-Everything PDES Platforms 16

A Lock-Free <i>O</i>(1) Event Pool and its Application to Sh...

引用

20th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT)

作者： Marotta, Romolo Ianni, Mauro Pellegrini, Alessandro Quaglia, Francesco DIAG Sapienza Univ Roma Rome Italy

ISBN: (纸本)9781509035052

The large diffusion of highly-parallel sharedmemory multi-core machines has led Parallel Discrete Event Simulation (PDES) platforms to a shift towards a share-everything model. This model is based on loose coupling between simulation objects and threads, lasting (as an extreme) no more than the lifetime of individual events. Concurrent threads can therefore CPU-dispatch events destined to any object at any point in time, thus fully sharing the workload of events to be processed on a fine grain basis. This demands for efficient mechanisms to share the overall pool of pending events by enabling parallelism in insertion and extraction operations. In this article we present a lock-free event pool which also provides amortized O(1) time complexity for both insertions and extractions. It can sustain highly concurrent accesses, while not leading to noticeable performance degradation when scaling up the thread count. Experimental results demonstrate that our solution stands as a core facility capable of further raising up the pragmatical impact of such an emerging shareeverything PDES paradigm.

关键词： Shared-Memory non-blocking algorithms Priority Queue Pending Event Set PDES Parallel Discrete Event Simulation

来源：评论

学校读者我要写书评

暂无评论

Cruiser: Concurrent Heap Buffer Overflow Monitoring Using Lock-free Data Structures 11

引用

32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 11)

作者： Zeng, Qiang Wu, Dinghao Liu, Peng Penn State Univ Dept Comp Sci & Engn University Pk PA 16802 USA

ISBN: (纸本)9781450306638

Security enforcement inlined into user threads often delays the protected programs;inlined resource reclamation may interrupt program execution and defer resource release. We propose software cruising, a novel technique that migrates security enforcement and resource reclamation from user threads to a concurrent monitor thread. The technique leverages the increasingly popular multicore and multiprocessor architectures and uses lock-free data structures to achieve non-blocking and efficient synchronization between the monitor and user threads. As a case study, software cruising is applied to the heap buffer overflow problem. Previous mitigation and detection techniques for this problem suffer from high performance overhead, legacy code compatibility, semantics loyalty, or tedious manual program transformation. We present a concurrent heap buffer overflow detector, CRUISER, in which a concurrent thread is added to the user program to monitor heap integrity, and custom lock-free data structures and algorithms are designed to achieve high efficiency and scalability. The experiments show that our approach is practical: it imposes an average of 5% performance overhead on SPEC CPU2006, and the throughput slowdown on Apache is negligible on average.

关键词： Software cruising buffer overflow program monitor multicore concurrency lock-free non-blocking algorithms

来源：评论

学校读者我要写书评

暂无评论

Transactional Boosting: A Methodology for Highly-Concurrent Transactional Objects 08

Transactional Boosting: A Methodology for Highly-Concurrent ...

引用

ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 08)

作者： Herlihy, Maurice Koskinen, Eric Brown Univ Dept Comp Sci Providence RI 02912 USA

ISBN: (纸本)9781595939609

We describe a methodology for transforming a large class of highly-concurrent linearizable objects into highly-concurrent transactional objects. As long as the linearizable implementation satisfies certain regularity properties (informally, that every method has an inverse), we define a simple wrapper for the linearizable implementation that guarantees that concurrent transactions without inherent conflicts can synchronize at the same granularity as the original linearizable implementation.

关键词： Transactional Boosting non-blocking algorithms abstract locks transactional memory commutativity

来源：评论

学校读者我要写书评

暂无评论

Efficient adaptations of the non-blocking buffer for event message communication between real-time threads

Efficient adaptations of the non-blocking buffer for event m...

引用

10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing

作者： Kim, K. H. (Kane) Colmenares, Juan A. Rim, Kee-Wook Univ Calif Irvine Dept Elect Engn & Comp Sci Irvine CA 92697 USA Sunmoon Univ Divi Comp & Informat Sci Asan South Korea

ISBN: (纸本)9780769527659

Enabling message communication among concurrent computing threads without relying on mutual exclusion (i.e., locking) is highly desirable in real-time computing systems. This paper presents a refined version of the non-blocking Buffer (NBB), which is a lock-free interaction mechanism that enables efficient event-message communication between a single producer thread. and a single consumer thread. The NBB scheme presented here contains improvements over the previous version in two aspects. First, application designers now have the flexibility of choosing the consumer's retry strategy for the case when the buffer is empty but the producer is in the middle of inserting an item. Second, in the refined version the producer inserts pointers to data items into the buffer whereas the consumer obtains copies of the items. This design is consistent with the fact that shared heap management must be avoided to enable fully lock-free interaction between the producer and the consumer This paper also discusses the approaches based on the NBB mechanism for supporting all conceivable producer-consumer scenarios.

关键词： non-blocking algorithms event message communication producer-consumer concur-rent programming real-time computing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：