检索结果-内蒙古大学图书馆

IEEE International Conference on Application-specific Systems, Architectures and Processors

作者： Athanasios K. Grivas Terrence Mak Alex Yakovlev Jonny Wray School of Electrical and Electronic Engineering Newcastle University UK Department of Computer Science and Engineering The Chinese University of Hong Kong China

ISBN: (纸本)9781479904945

Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle with the efficient processing of such networks. Employing multi-core graphic processing units (CPUs) could provide sufficient processing power for the analysis of such networks. However, commonly designed algorithms cannot exploit these massively parallel processing power for the analysis of such networks. In this paper, we present the multi Layer Network Decomposition (MLND) approach which provides a general approach for parallel network analysis using multi-core processors via efficient partitioning and mapping of networks onto GPU architectures. Evaluation using a 336 core GPU graphic card demonstrated a 16x speed-up in complex network analysis relative to a CPU based approach.

关键词： MLND GPU multi-core algorithms GRAPPER PICK UP Graphics Processing Unit processing power complex networks central processing units Network multilayer network multi-core processors Network analysis

来源：评论

学校读者我要写书评

暂无评论

Scalable training of 3D convolutional networks on multi- and many-cores

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2017年 106卷 195-204页

作者： Zlateski, Aleksandar Lee, Kisuk Seung, H. Sebastian MIT Elect Engn & Comp Sci Dept Cambridge MA 02139 USA MIT Brain & Cognit Sci Dept Cambridge MA 02139 USA Princeton Univ Princeton Neurosci Inst Princeton NJ 08540 USA Princeton Univ Comp Sci Dept Princeton NJ 08540 USA

Convolutional networks (ConvNets) have become a popular approach to computer vision. Here we consider the parallelization of ConvNet training, which is computationally costly. Our novel parallel algorithm is based on decomposition into a set of tasks, most of which are convolutions or FFTs. Theoretical analysis suggests that linear speedup with the number of processors is attainable. To attain such performance on real shared-memory machines, our algorithm computes convolutions converging on the same node of the network with temporal locality to reduce cache misses, and sums the convergent convolution outputs via an almost wait-free concurrent method to reduce time spent in critical sections. Benchmarking with multi-core CPUs shows speedup roughly equal to the number of physical cores. We also demonstrate 90x speedup on a many-core CPU (Xeon Phi Knights Corner). Our algorithm can be either faster or slower than certain GPU implementations depending on specifics of the network architecture, kernel sizes, and density and size of the output patch. (C) 2017 Elsevier Inc. All rights reserved.

关键词： Convolutional neural networks Deep learning FFT convolution Dynamic scheduling multi-core algorithms Wait-free summation

来源：评论

学校读者我要写书评

暂无评论

Recoverable mutual exclusion with abortability

引用

COMPUTING 2022年第10期104卷 2225-2252页

作者： Jayanti, Prasad Joshi, Anup Dartmouth Coll Hanover NH 03755 USA

Recent advances in non-volatile main memory (NVM) technology have spurred research on algorithms that are resilient to intermittent failures that cause processes to crash and subsequently restart. In this paper we present a Recoverable Mutual Exclusion (RME) algorithm that supports abortability. Our algorithm guarantees FCFS and a strong liveness property: processes do not starve even in runs consisting of infinitely many crashes, provided that a process crashes at most a finite number of times in each of its attempts. On DSM and Relaxed-CC multiprocessors, a process incurs O (min(k, log n)) RMRs in a passage and O(f + min (k, log n)) RMRs in an attempt, where n is the number of processes that the algorithm is designed for, k is the point contention of the passage or the attempt, and f is the number of times that p crashes during the attempt. On a Strict CC multiprocessor, the passage and attempt complexities are O(n) and 0(f + n), respectively. Our algorithm uses only the read, write, and CAS operations, which are commonly supported by multiprocessors. Attiya, Hendler, and Woelfel proved that, with any mutual exclusion algorithm, a process incurs at least 52 (log n) RMRs in a passage, if the algorithm uses only the read, write, and CAS operations (in: Proc. of the Fortieth ACM Symposium on Theory of Computing, New York, NY, USA, 2008). This lower bound implies that the worst-case RMR complexity of our algorithm is optimal for the DSM and Relaxed CC multiprocessors. This paper is an expanded version of our conference paper as reported by Jayanti and Joshi (in: Atig and Schwarzmann (eds) Networked Systems. Springer International Publishing, Cham, 2019), which presented the first Recoverable Mutual Exclusion (RME) algorithm that supports abortability. This algorithm from our conference paper (in: Atig and Schwarzmann (eds) Networked Systems. Springer International Publishing, Cham, 2019) admits starvation when there are infinitely many aborts in a run. In this paper,

关键词： Concurrent algorithm Synchronization Mutual exclusion Recoverable algorithm Fault tolerance Non-volatile main memory Shared memory multi-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Recoverable mutual exclusion

引用

DISTRIBUTED COMPUTING 2019年第6期32卷 535-564页

作者： Golab, Wojciech Ramaraju, Aditya Univ Waterloo Dept Elect & Comp Engn Waterloo ON Canada

Mutex locks have traditionally been the most common mechanism for protecting shared data structures in concurrent programs. However, the robustness of such locks against process failures has not been studied thoroughly. The vast majority of mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their efficiency. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of efficiency, which we define in terms of remote memory references.

关键词： Mutual exclusion Fault tolerance Recovery Concurrency Synchronization Shared memory Non-volatile main memory multi-core algorithms Durable data structures

来源：评论

学校读者我要写书评

暂无评论

PMS6MC: A multicore Algorithm for Motif Discovery

引用

algorithms 2013年第4期6卷 805-823页

作者： Bandyopadhyay, Shibdas Sahni, Sartaj Rajasekaran, Sanguthevar VMware Inc 3401 Hillview Ave Palo Alto CA 94304 USA Univ Florida Dept CISE Gainesville FL 32611 USA Univ Connecticut Dept CSE Storrs CT 06269 USA

We develop an efficient multicore algorithm, PMS6MC, for the (l;d) -motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. PMS6MC is based on PMS6, which is currently the fastest single-core algorithm for motif discovery in large instances. The speedup, relative to PMS6, attained by our multicore algorithm ranges from a high of 6.62 for the (17,6) challenging instances to a low of 2.75 for the (13,4) challenging instances on an Intel 6-core system. We estimate that PMS6MC is 2 to 4 times faster than other parallel algorithms for motif search on large instances.

关键词： planted motif search parallel string algorithms multi-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Automated constraint-based addition of nonmasking and stabilizing fault-tolerance

引用

THEORETICAL COMPUTER SCIENCE 2011年第33期412卷 4228-4246页

作者： Abujarad, F. Kulkarni, S. S. Yale Univ Dept Emergency Med New Haven CT 06519 USA Michigan State Univ Dept Comp Sci & Engn E Lansing MI 48824 USA

We focus on the constraint-based automated addition of nonmasking and stabilizing fault-tolerance to hierarchical programs. We specify legitimate states of the program in terms of constraints that should be satisfied in those states. To deal with faults that may violate these constraints, we add recovery actions while ensuring interference freedom among the recovery actions added for satisfying different constraints. Since the constraint-based manual design of fault-tolerance is well known, we expect our approach to have a significant benefit in automating the addition of fault-tolerance. We illustrate our algorithm with four case studies: stabilizing mutual exclusion, stabilizing diffusing computation, a data dissemination problem in sensor networks, and tree maintenance. With experimental results, we show that the complexity of our algorithm is reasonable and that it can be reduced using the structure of the hierarchical systems. We also reduced the time complexity of the synthesis using parallelism. We consider two approaches to speedup the synthesis algorithm: first, the use of the multiple constraints that have to be satisfied during synthesis;second, the use of the distributed nature of the programs being synthesized. We show that our approaches provide significant reduction in the synthesis time. To our knowledge, this is the first instance where automated synthesis has been successfully used in synthesizing programs that are correct under fairness assumptions. Moreover, in three of the case studies considered in this paper, the structure of the recovery paths is too complex to permit existing heuristic-based approaches for adding recovery. (C) 2011 Elsevier B.V. All rights reserved.

关键词： Nonmasking fault-tolerance Stabilization Program synthesis multi-core algorithms Distributed programs

来源：评论

学校读者我要写书评

暂无评论

A Recoverable Mutex Algorithm with Sub-logarithmic RMR on Both CC and DSM 19

A Recoverable Mutex Algorithm with Sub-logarithmic RMR on Bo...

引用

38th ACM Symposium on Principles of Distributed Computing (PODC)

作者： Jayanti, Prasad Jayanti, Siddhartha Joshi, Anup Dartmouth Coll Hanover NH 03755 USA MIT Cambridge MA 02139 USA

ISBN: (纸本)9781450362177

In light of recent advances in non-volatilemain memory technology, Golab and Ramaraju reformulated the traditional mutex problem into the novel Recoverable Mutual Exclusion (RME) problem. In the best known solution for RME, due to Golab and Hendler from PODC 2017, a process incurs at most O(log n/log log n) remote memory references (RMRs) per passage on a system with n processes, where a passage is an interval from when a process enters the Try section to when it subsequently returns to Remainder. Their algorithm, however, guarantees this bound only for cache-coherent (CC) multiprocessors, leaving open the question of whether a similar bound is possible for distributed shared memory (DSM) multiprocessors. We answer this question affirmatively by designing an algorithm for a system with n processes, such that, it satisfies the same complexity bound as Golab and Hendler's for both CC and DSM multiprocessors. Our algorithm has some additional advantages over Golab and Hendler's: (i) its Exit section is wait-free, (ii) it uses only the Fetch-and-Store instruction, and (iii) on a CC machine our algorithm needs each process to have a cache of only O(1) words, while their algorithm needs a cache of size that is a function of n.

关键词： concurrent algorithm synchronization mutual exclusion recoverable algorithm fault tolerance non-volatile main memory shared memory multi-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Recoverable Mutual Exclusion Under System-Wide Failures 18

Recoverable Mutual Exclusion Under System-Wide Failures

引用

37th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC)

作者： Golab, Wojciech Hendler, Danny Univ Waterloo Dept Elect & Comp Engn Waterloo ON Canada Ben Gurion Univ Negev Dept Comp Sci Beer Sheva Israel

ISBN: (纸本)9781450357951

Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references-expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is T(logn) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(logn/log logn) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and T(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives;and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transfo

关键词： Mutual exclusion recovery fault tolerance concurrency shared memory multi-core algorithms non-volatile main memory persistent data structures

来源：评论

学校读者我要写书评

暂无评论

A Flexible and Scalable Affinity Lock for the Kernel 16

A Flexible and Scalable Affinity Lock for the Kernel

引用

16th IEEE International Conference on High Performance Computing and Communications HPCC 2014\11th IEEE International Conference on Embedded Software and Systems ICESS 2014\6th International Symposium on Cyberspace Safety and Security CSS 2014

作者： Zhang, Benlong Kang, Junbin Wo, Tianyu Wang, Yuda Yang, Renyu Beihang Univ State Key Lab Software Dev Environm Beijing Peoples R China

ISBN: (纸本)9781479961238

A number of NUMA-aware synchronization algorithms have been proposed lately to stress the scalability inefficiencies of existing locks. However their presupposed local lock granularity, a physical processor, is often not the optimum configuration for various workloads. This paper further explores the design space by taking into consideration the physical affinity between the cores within a single processor, and presents FSL to support variable and finely tuned group size for different lock contexts and instances. The new design provides a uniform model for the discussion of affinity locks and can completely subsume the previous NUMA-aware designs because they have only discussed one special case of the model. The interfaces of the new scheme are kernel-compatible and thus largely facilitate kernel incorporation. The investigation with the lock shows that an affinity lock with optimal local lock granularity can outperform its NUMA-aware counterpart by 29.40% and 58.28% at 80 cores with different workloads.

关键词： synchronization algorithms multi-core algorithms

来源：评论

学校读者我要写书评

暂无评论

Separating Lock-Freedom from Wait-Freedom 18

Separating Lock-Freedom from Wait-Freedom

引用

37th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC)

作者： Attiya, Hagit Castaneda, Armando Hendler, Danny Perrin, Matthieu Technion Dept Comp Sci Haifa Israel Univ Nacl Autonoma Mexico Inst Matemat Mexico City DF Mexico Ben Gurion Univ Negev Dept Comp Sci Beer Sheva Israel Univ Nantes LS2N Nantes France

ISBN: (纸本)9781450357951

A long-standing open question has been whether lock-freedom and wait-freedom are fundamentally different progress conditions, namely, can the former be provided in situations where the latter cannot? This paper answers the question in the affirmative, by proving that there are objects with lock-free implementations, but without wait-free implementations-using objects of any finite power. We precisely define an object called n-process long-lived approximate agreement (n-LLAA), in which two sets of processes associated with two sides, 0 or 1, need to decide on a sequence of increasingly closer outputs. We prove that 2-LLAA has a lock-free implementation using reads and writes only, while n-LLAA has a lock-free implementation using reads, writes and (n - 1)-process consensus objects. In contrast, we prove that there is no wait-free implementation of the n-LLAA object using reads, writes and specific (n - 1)-process consensus objects, called (n - 1)-window registers.

关键词： concurrency shared memory multi-core algorithms wait-freedom lock-freedom nonblocking

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：