检索结果-内蒙古大学图书馆

21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)

作者： Gonzalez Tallada, Marc Univ Politecn Cataluna BarcelonaTech E-08028 Barcelona Spain

ISBN: (纸本)9781450340922

Deep neural networks (DNN) have recently achieved extraordinary results in domains like computer vision and speech recognition. An essential element for this success has been the introduction of high performance computing (HPC) techniques in the critical step of training the neural network. This paper describes the implementation and analysis of a network-agnostic and convergence-invariant coarse-grain parallelization of the DNN training algorithm. The coarse-grain parallelization is achieved through the exploitation of the batch-level parallelism. This strategy is independent from the support of specialized and optimized libraries. Therefore, the optimization is immediately available for accelerating the DNN training. The proposal is compatible with multi-GPU execution without altering the algorithm convergence rate. The parallelization has been implemented in Caffe, a state-of-the-art DNN framework. The paper describes the code transformations for the parallelization and we also identify the limiting performance factors of the approach. We show competitive performance results for two state-of-the-art computer vision datasets, MNIST and CIFAR-10. In particular, on a 16-core Xeon E5-2667v2 at 3.30GHz we observe speedups of 8x over the sequential execution, at similar performance levels of those obtained by the GPU optimized Caffe version in a NVIDIA K40 GPU.

关键词： Performance Coarse-grain Parallelism shared memory algorithms Deep Learning Neural Networks OpenMP Stochastic Gradient Descent

来源：评论

学校读者我要写书评

暂无评论

How Emerging memory Technologies Will Have You Rethinking Algorithm Design 16

How Emerging Memory Technologies Will Have You Rethinking Al...

引用

35th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC)

作者： Gibbons, Phillip B. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450339643

We are on the cusp of the emergence of a new wave of nonvolatile memory technologies that are projected to become the dominant type of main memory in the near future. A key property of these new memory technologies is their asymmetric read-write costs: Writes can be an order of magnitude or more higher energy, higher latency, and lower (per module) bandwidth than reads. This high cost for writes motivates a rethinking of algorithm design towards "write efficient" algorithms and data structures that reduce their number of writes [1, 2, 3, 4, 5, 6]. Many popular techniques for sequential, distributed, and parallel algorithms are tuned to the setting where reads and writes cost the same, and hence need to be revisited. Prior work on reducing writes to contended cache lines in shared memory algorithms can be useful here, but with the new technologies, even writes to uncontended memory is costly. Moreover, the new technologies are unlikely to replace the fastest cache memory, motivating the study of a multi-level memory hierarchy comprised of smaller symmetric level(s) and a larger asymmetric level. Lower bounds, too, need to be revisited in light of asymmetric costs. This talk provides background on these emerging memory technologies, highlights the progress to date on these exciting research questions, and touches on a few of the many open problems.

关键词： write-efficient algorithms asymmetric read-write cost persistent memory NVRAM shared memory algorithms memory hierarchies models of computation

来源：评论

学校读者我要写书评

暂无评论

Cache-Sensitive MapReduce DGEMM algorithms for shared memory Architectures

Cache-Sensitive MapReduce DGEMM Algorithms for Shared Memory...

引用

Symposium of the South African Institute for Computer Scientists and Information Technologists (SAICSIT)

作者： Nimako, Gideon Otoo, E. J. Ohene-Kwofie, Daniel Univ Witwatersrand Sch Comp Sci Johannesburg South Africa

ISBN: (纸本)9781450313087

Parallelism in linear algebra libraries is a common approach to accelerate numerical and scientific applications. Matrix-matrix multiplication is one of the most widely used computations in scientific and numerical algorithms. Although a number of matrix multiplication algorithms exist for distributed memory environments (e.g., Cannon, Fox, PUMMA, SUMMA), matrix-matrix multiplication algorithms for shared memory and SMP architectures have not been extensively studied. In this paper, we present a fast matrix-matrix multiplication algorithm for multi-core and SMP architectures using the MapReduce framework. memory-resident linear algebra algorithms suffer performance losses on modern multi-core architectures because of the increasing performance gap between the CPU and main memory. To allow such compute-intensive algorithms to exploit the full potential of the program's inherent instruction level parallelism, the adverse effect of the processor-memory performance gap should be minimized. We present a cache-sensitive MapReduce matrix multiplication algorithm that fully exploits memory bandwidth and minimize cache misses and conflicts. Our experimental results show that the two algorithms outperform existing matrix multiplication algorithms for shared-memory architectures such as those given in the Phoenix, PLASMA and LAPACK libraries.

关键词： Cache-Sensitive shared memory algorithms MapReduce Matrix Multiplication

来源：评论

学校读者我要写书评

暂无评论

Tight Time-Space Tradeoff for Mutual Exclusion 12

Tight Time-Space Tradeoff for Mutual Exclusion

引用

44th ACM Annual Symposium on Theory of Computing (STOC)

作者： Bansal, Nikhil Bhatt, Vibhor Jayanti, Prasad Kondapally, Ranganath Eindhoven Univ Technol Eindhoven Netherlands Microsoft Corp Redmond WA 98052 USA Dartmouth Coll Dept Comp Sci Hanover NH 03755 USA

ISBN: (纸本)9781450312455

Mutual Exclusion is a fundamental problem in distributed computing, and the problem of proving upper and lower bounds on the RMR complexity of this problem has been extensively studied. Here, we give matching lower and upper bounds on how RMR complexity trades off with space. Two implications of our results are that constant RMR complexity is impossible with subpolynomial space and subpolynomial RMR complexity is impossible with constant space for cache-coherent multiprocessors, regardless of how strong the hardware synchronization operations are. To prove these results we show that the complexity of mutual exclusion, which can be "messy" to analyze because of system details such as asynchrony and cache coherence, is captured precisely by a simple and purely combinatorial game that we design. We then derive lower and upper bounds for this game, thereby obtaining corresponding bounds for mutual exclusion. The lower bounds for the game are proved using potential functions.

关键词： Mutual exclusion remote memory reference (RMR) process synchronization lower bounds bin-pebble game time-space tradeoff shared memory algorithms cache coherence

来源：评论

学校读者我要写书评

暂无评论

Vascular System Modeling in Parallel Environment - Distributed and shared memory Approaches

引用

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE 2011年第4期15卷 668-672页

作者： Jurczuk, Krzysztof Kretowski, Marek Bezy-Wendling, Johanne Bialystok Tech Univ Fac Comp Sci PL-15351 Bialystok Poland INSERM U642 F-35000 Rennes France Univ Rennes 1 LTSI F-35000 Rennes France

This paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages, and therefore, this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multicore machines, show that both algorithms provide a significant speedup.

关键词： Computational modeling distributed memory algorithms parallel computing shared memory algorithms vascular system

来源：评论

学校读者我要写书评

暂无评论

Sharing memory between Byzantine Processes Using Policy-Enforced Tuple Spaces

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2009年第3期20卷 419-432页

作者： Bessani, Alysson Neves Correia, Miguel Fraga, Joni da Silva Lung, Lau Cheuk Univ Lisbon Fac Ciencias Dept Informat P-1749016 Lisbon Portugal Univ Fed Santa Catarina DAS Dept Automacao & Sistemas BR-88040900 Florianopolis SC Brazil Univ Fed Santa Catarina INE Dept Informat & Estat BR-88040900 Florianopolis SC Brazil

Despite the large amount of Byzantine fault-tolerant algorithms for message-passing systems designed through the years, only recent algorithms for the coordination of processes subject to Byzantine failures using shared memory have appeared. This paper presents a new computing model in which shared memory objects are protected by fine-grained access policies, and a new shared memory object, the Policy-Enforced Augmented Tuple Space (PEATS). We show the benefits of this model by providing simple and efficient consensus algorithms. These algorithms are much simpler and require less shared memory operations, using also less memory bits than previous algorithms based on access control lists (ACLs) and sticky bits. We also prove that PEATS objects are universal, i.e., that they can be used to implement any other shared memory object, and present lock-free and wait-free universal constructions.

关键词： Byzantine fault-tolerance shared memory algorithms tuple spaces consensus universal constructions

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：