检索结果-内蒙古大学图书馆

48th International Conference on Parallel Processing (ICPP)

作者： Brock, Benjamin Buluc, Aydin Yelick, Katherine Univ Calif Berkeley Lawrence Berkeley Natl Lab Berkeley CA 94720 USA

ISBN: (纸本)9781450362955

One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application-level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization.

关键词： Parallel Programming Libraries RDMA distributed data structures

来源：评论

学校读者我要写书评

暂无评论

RDMA vs. RPC for Implementing distributed data structures 9

RDMA vs. RPC for Implementing Distributed Data Structures

引用

9th IEEE/ACM Workshop on Irregular Applications - Architectures and Algorithms (IA3)

作者： Brock, Benjamin Chen, Yuxin Yan, Jiakun Owens, John D. Buluc, Aydin Yelick, Katherine Univ Calif Berkeley Berkeley CA 94720 USA Lawrence Berkeley Natl Lab Berkeley CA 94720 USA Univ Calif Davis Davis CA 95616 USA Shanghai Jiao Tong Univ Shanghai Peoples R China

ISBN: (纸本)9781728159874

distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a hash table bucket or distributed queue, rather than global operations in which all processors collectively exchange information. We look at the trade-offs between the two styles through microbenchmarks and a performance model that approximates the cost of each. The RDMA operations have direct hardware support in the network and therefore lower latency and overhead, while the RPC operations are more expressive but higher cost and can suffer from lack of attentiveness from the remote side. We also run experiments to compare the real-world performance of RDMA- and RPC-based data structure operations with the predicted performance to evaluate the accuracy of our model, and show that while the model does not always precisely predict running time, it allows us to choose the best implementation in the examples shown. We believe this analysis will assist developers in designing data structures that will perform well on current network architectures, as well as network architects in providing better support for this class of distributed data structures.

关键词： distributed data structures remote procedure call (RPC) remote direct memory access (RDMA)

来源：评论

学校读者我要写书评

暂无评论

Improving Average Performance by Relaxing distributed data structures 1

引用

28th International Symposium on distributed Computing

作者： Talmage, Edward Welch, Jennifer L. Texas A&M Univ Dept Comp Sci & Engn College Stn TX 77843 USA

ISBN: (数字)9783662451748

ISBN: (纸本)9783662451748

Linearizability is a powerful consistency condition but can be expensive to implement. Recently, reserarchers have suggested gaining performance by relaxing the sequential specification of objects' data types. We consider, for the first time, linearizable message-passing implementations of relaxed Queues and prove upper and lower bounds on the elapsed time for Dequeue operations both in the worst case and on average. Our results imply that worst-case time complexity does not indicate benefit from relaxation. In contrast, we present implementations of relaxed Queues for which the average time complexity of Dequeue is significantly smaller than both the worst-case lower bound for unrelaxed Queues and a newly-proved lower bound on the average time for unrelaxed Queues. We also prove lower bounds on the average time complexity of Dequeue for relaxed Queues that show our algorithms are asymptotically optimal and that there is an inherent complexity gap between different levels of relaxation.

关键词： consistency conditions distributed data structures relaxed specifications average cost time bounds

来源：评论

学校读者我要写书评

暂无评论

A Design Space for distributed Producer-Consumer data structures Using RDMA

A Design Space for Distributed Producer-Consumer Data Struct...

引用

作者： Sharma, Manoj Kumar University of Waterloo

学位级别：硕士

Remote Direct Memory Access (RDMA) has become a standard networking technology and is prominently used in high-performance applications. While RDMA can provide both excellent performance and novel capabilities, it can be more difficult than traditional kernel-supported networking to use effectively. RDMA adoption can be simplified through the use of abstractions that target broad classes of applications. In this thesis, we propose flow structures as an abstraction targeting distributed applications in which data need to flow directly between producers and consumers. Flow structures are asymmetric distributed producer/consumer data structures in which some servers only produce and others only consume. We present a design space for RDMA implementations of flow structures and illustrate the space using several concrete implementations of a simple flow structure: a FIFO queue. Finally, we present an empirical study that illustrates the design tradeoffs along the dimensions of the design space, and that shows that flow structures abstractions can capture the performance advantages of RDMA while making it much easier to use.

关键词： distributed data structures Flow structures RDMA

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: ROMe: Wait-free Objects for RDMA 24

Brief Announcement: ROMe: Wait-free Objects for RDMA

引用

36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)

作者： Nelson-Slivon, Jacob Yankovich, Reilly Hassan, Ahmed Palmieri, Roberto Lehigh Univ Bethlehem PA 18015 USA

ISBN: (纸本)9798400704161

Ensuring data consistency under remote direct memory access (RDMA) is challenging due to the combined effects of various hard-ware components. This brief announcement introduces remote object memory (ROMe), the first technique to guarantee wait-free consistent reads of arbitrarily sized objects over RDMA without the use of specialized hardware, and while allowing the concurrent execution of conflicting local updates. We integrated ROMe into ROMe-KV, an RDMA-enabled key-value store whose underlying Blink tree nodes are ROMe objects that enable supporting wait-free linearizable range queries.

关键词： Consistency distributed data structures RDMA

来源：评论

学校读者我要写书评

暂无评论

Source-Oblivious Broadcast 1

引用

18th Annual Conference on Theory and Applications of Models of Computation (TAMC)

作者： Fraigniaud, Pierre Harutyunyan, Hovhannes A. CNRS Inst Rech Informat Fondamentale Paris France Univ Paris Cite Paris France Concordia Univ Dept Comp Sci & Software Engn Montreal PQ Canada

ISBN: (数字)9789819723409

ISBN: (纸本)9789819723393;9789819723409

This paper revisits the study of (minimum) broadcast graphs, i.e., graphs enabling fast information dissemination from every source node to all the other nodes (and having minimum number of edges for this property). This study is performed in the framework of compact distributed data structures, that is, when the broadcast protocols are bounded to be encoded at each node as an ordered list of neighbors specifying, upon reception of a message, in which order this message must be passed to these neighbors. We show that this constraint does not limit the power of broadcast protocols, as far as the design of (minimum) broadcast graphs is concerned. Specifically, we show that, for every n, there are n-node graphs for which it is possible to design protocols encoded by lists yet enabling broadcast in [log(2) n] rounds from every source, which is optimal even for general (i.e., non space-constrained) broadcast protocols. Moreover, we show that, for every n, there exist such graphs with the additional property that they are asymptotically as sparse as the sparsest graphs for which [log(2) n]-round broadcast protocols exist, up to a constant multiplicative factor. Concretely, these graphs have O(n center dot L(n)) edges, where L(n) is the number of leading 1 s in the binary representation of n - 1, and general minimum broadcast graphs are known to have Omega(n center dot L(n)) edges.

关键词： Broadcast graphs Minimum broadcast graphs Network design Information dissemination distributed data structures

来源：评论

学校读者我要写书评

暂无评论

Low-Depth Spatial Tree Algorithms 38

Low-Depth Spatial Tree Algorithms

引用

International Parallel and distributed Processing Symposium (IPDPS)

作者： Baumann, Yves Ben-Nun, Tal Besta, Maciej Gianinazzi, Lukas Hoefler, Torsten Luczynski, Piotr Swiss Fed Inst Technol Zurich Switzerland

ISBN: (纸本)9798350387117;9798350387124

Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.

关键词： Graph algorithms Parallel algorithms Trees Routing and layout distributed data structures dataflow architectures Adaptable architectures

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Improved, Partially-Tight Multiplicity Queue Lower Bounds 23

Brief Announcement: Improved, Partially-Tight Multiplicity Q...

引用

42nd ACM SIGACT-SIGOPS Symposium on Principles of distributed Computing (PODC)

作者： Anh Tran Talmage, Edward Bucknell Univ Lewisburg PA 17837 USA

ISBN: (纸本)9798400701214

A multiplicity queue is a concurrently-defined data type which relaxes the conditions of a linearizable FIFO queue by allowing concurrent Dequeue instances to return the same value. It would seem that this should allow faster message-passing implementations, as processes should not need to wait as long to learn about concurrent operations and previous work has shown that multiplicity queues are computationally less complex than the unrelaxed version. Intriguingly, recentwork has shown that there is, in fact, little possible speedup versus an unrelaxed queue. Seeking to understand this difference between intuition and real behavior, we increase the lower bound for uniform algorithms. Further, we outline a path toward building proofs for even higher lower bounds, hypothesizing that the worst-case time to Dequeue approaches maximum message delay, which is similar to the time required for an unrelaxed Dequeue We also give an upper bound for a special case to show that our bounds are tight at that point. To achieve our lower bounds, we use extended shifting arguments, which have been rarely used but allow larger lower bounds than traditional shifting arguments. We use these in series of inductive indistinguishability proofs which allow us to extend our proofs beyond the usual limitations of shifting arguments. This proof structure is an interesting contribution independently of the main result, as developing new lower bound proof techniques may have many uses in future work.

关键词： distributed data structures ADTs Lower Bounds Shifting Arguments Multiplicity Queues

来源：评论

学校读者我要写书评

暂无评论

Lower Bounds on Message Passing Implementations of Multiplicity-Relaxed Queues and Stacks 1

引用

29th International Colloquium on Structural Information and Communication Complexity (SIROCCO)

作者： Talmage, Edward Bucknell Univ Lewisburg PA 17837 USA

ISBN: (数字)9783031099939

ISBN: (纸本)9783031099939;9783031099922

A multiplicity-relaxed queue or stack data type allows multiple Dequeue or Pop operations to return the same value if they are concurrent. We consider the possible efficiency of message-passing implementations of such data types. We show that both the worst case and amortized time cost for Dequeues and Pops are nearly as high as upper bounds for their worst-case time in unrelaxed queues and stacks. Relaxed data types are of interest since they can in some cases trade off some of data types' ordering guarantees for increased performance or easier implementation. The multiplicity relaxation, in particular, is interesting as it has been shown to be less computationally complex than unrelaxed queues and stacks. Our results explore a different aspect of these data types, considering communication time complexity in a message passing system and showing limits on possible improved time performance.

关键词： distributed data structures Message Passing Relaxed data Types Lower Bounds

来源：评论

学校读者我要写书评

暂无评论

A Dynamic distributed Deterministic Load-Balancer for Decentralized Hierarchical Infrastructures

引用

ALGORITHMS 2022年第3期15卷

作者： Sioutas, Spyros Sourla, Efrosini Tsichlas, Kostas Vonitsanos, Gerasimos Zaroliagis, Christos Univ Patras Dept Comp Engn & Informat Patras 26504 Greece Comp Technol Inst & Press Diophantus Patras 26504 Greece

In this work, we propose D-3-Tree, a dynamic distributed deterministic structure for data management in decentralized networks, by engineering and extending an existing decentralized structure. Conducting an extensive experimental study, we verify that the implemented structure outperforms other well-known hierarchical tree-based structures since it provides better complexities regarding load-balancing operations. More specifically, the structure achieves an O(log N) amortized bound (N is the number of nodes present in the network), using an efficient deterministic load-balancing mechanism, which is general enough to be applied to other hierarchical tree-based structures. Moreover, our structure achieves O(log N) worst-case search performance. Last but not least, we investigate the structure's fault tolerance, which hasn't been sufficiently tackled in previous work, both theoretically and through rigorous experimentation. We prove that D-3-Tree is highly fault-tolerant and achieves O (log N) amortized search cost under massive node failures, accompanied by a significant success rate. Afterwards, by incorporating this novel balancing scheme into the ART (Autonomous Range Tree) structure, we go one step further to achieve sub-logarithmic complexity and propose the ART(+) structure. ART(+) achieves an O (log(b)(2) log N) communication cost for query and update operations (b is a double-exponentially power of 2 and N is the total number of nodes). Moreover, ART(+) is a fully dynamic and fault-tolerant structure, which supports the join/leave node operations in O(log log N) expected WHP (with high proability) number of hops and performs load-balancing in O(log log N) amortized cost.

关键词： decentralized systems distributed data structures P2P load balancing fault tolerance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：