检索结果-内蒙古大学图书馆

Supercomputing Conference

作者： Rupanshu Soi Michael Bauer Sean Treichler Manolis Papadakis Wonchan Lee Patrick McCormick Alex Aiken Elliott Slaughter BITS Pilani - Hyderabad Campus India NVIDIA USA Los Alamos National Laboratory USA Stanford University USA SLAC National Accelerator Laboratory USA

ISBN: (数字)9781450384421

ISBN: (纸本)9781665483902

It's common to see specialized language constructs in modern task-based programming systems for reasoning about groups of indepen-dent tasks intended for parallel execution. However, most systems use an ad-hoc representation that limits expressiveness and of-ten overfits for a given application domain. We introduce index launches, a scalable and flexible representation of a group of tasks. Index launches use a flexible mechanism to indicate the data required for a given task, allowing them to be used for a much broader set of use cases while maintaining an efficient representation. We present a hybrid design for index launches, involving static and dynamic program analyses, along with a characterization of how they're used in Legion and Regent, and show how they generalize constructs found in other task-based systems. Finally, we present re-sults of scaling experiments which demonstrate that index launches are crucial for the efficient distributed execution of several scientific codes in Regent.

关键词： Runtime Program processors Codes parallel programming High performance computing Supercomputers Cognition

来源：评论

学校读者我要写书评

暂无评论

Comparative Analysis of the Architecture of Dual-Core Blackfin Digital Signal Processors

Comparative Analysis of the Architecture of Dual-Core Blackf...

引用

International Conference on Information Science and Communications Technologies (ICISCT)

作者： Hakimjon Zaynidinov Sanjarbek Ibragimov Gayrat Tojiboyev Tashkent University of Information Technologies Tashkent Uzbekistan Andijan Machine Building Institute Andijan Uzbekistan Andijan State University Andijan Uzbekistan

ISBN: (纸本)9781665432597

Nowadays, special processors are widely used in digital processing of signals. Analog devices are one of the largest companies producing custom processors. This article compares and analyzes the functional descriptions of Analog Devices' Blackfin dual-core processor architecture. The architectures of the ADSP-BF561 and ADSP-BF60x processors are described, their capabilities and areas of application are described. The peripherals of single-core and dual-core processors are compared in the table. The core architecture of Blackfin processors is given and its constituents are described. The hierarchical memory architecture of dual-core Blackfin processors is given. Also included are tables comparing the internal memory of the Blackfin processor family, first-class L1 memory in the core. The internal memories of ADSP-BF561 and ADSP-BF60x Blackfin processors are compared.

关键词： Information science Program processors parallel programming Memory management Memory architecture Digital signal processors Signal processing algorithms

来源：评论

学校读者我要写书评

暂无评论

Cuttlefish: Library for Achieving Energy Efficiency in Multicore parallel Programs

Cuttlefish: Library for Achieving Energy Efficiency in Multi...

引用

Supercomputing Conference

作者： Sunil Kumar Akshat Gupta Vivek Kumar Sridutt Bhalachandra IIIT-Delhi India Lawrence Berkeley National Laboratory USA

ISBN: (数字)9781450384421

ISBN: (纸本)9781665483902

A low-cap power budget is challenging for exascale computing. Dy-namic Voltage and Frequency Scaling (DVFS) and Uncore Frequency Scaling (UFS) are the two widely used techniques for limiting the HPC application's energy footprint. However, existing approaches fail to provide a unified solution that can work with different types of parallel programming models and applications. This paper proposes Cuttlefish, a programming model oblivious C/C++ library for achieving energy efficiency in multicore parallel programs running over Intel processors. An online profiler periodi-cally profiles model-specific registers to discover a running appli-cation's memory access pattern. Using a combination of DVFS and UFS, Cuttlefish then dynamically adapts the processor's core and uncore frequencies, thereby improving its energy efficiency. The evaluation on a 20-core Intel Xeon processor using a set of widely used OpenMP benchmarks, consisting of several irregular-tasking and work-sharing pragmas, achieves geometric mean energy savings of 19.4% with a 3.6% slowdown.

关键词： Adaptation models Limiting Multicore processing parallel programming Semantics Power control Production

来源：评论

学校读者我要写书评

暂无评论

Non-Blocking Technique for parallel Algorithms with Global Barrier Synchronization

Non-Blocking Technique for Parallel Algorithms with Global B...

引用

International Conference on Computational Science and Computational Intelligence (CSCI)

作者： Arturo Garza Claudio A. Parra Isaac D. Scherson Department of Computer Science University of California Irvine Irvine CA USA

ISBN: (纸本)9781665458429

Sharing data among asynchronous processes is considered to be a hard systems problem in multithreaded modern shared-memory multicore systems. Throughout the literature, multiple solutions have been proposed, like the so-called barrier synchronization. A Barrier is a synchronization primitive that provides guarantees that any thread will not continue execution from a given point until all threads have reached that point. This primitive is widely used in different parallel programming models, but it can easily become a hot-spot for performance critical applications due to its global nature as one preempted thread will stop execution of all other threads waiting at the barrier. This paper suggests a technique to change the global nature of barrier synchronization into a non-blocking synchronization model with lock-free thread progression guarantees. The main idea is to exploit algorithm-based memory access patterns to implement self-synchronizable threads to protect concurrent reads and writes in a shared data structure without explicit use of a barrier primitive. To the best of our knowledge, this is the first attempt to provide a different synchronization mechanism based on the algorithm intrinsic characteristics rather than an explicit use of a global barrier in shared-memory architectures. Our experimental results show factors of performance improvement against its global barrier-based algorithm counterpart.

关键词： Scientific computing parallel programming Multicore processing Instruction sets Data structures Synchronization parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallelization of Sequential Pattern Sampling

Parallelization of Sequential Pattern Sampling

引用

IEEE International Conference on Big Data

作者： Lamine Diop Cheikh Ba University of Tours Blois France

In the last years, the field of data mining has undergone extensive work on patterns discovery by sampling techniques. Recently, these sampling methods have been applied to sequential data that are complex in nature. The complexity of these data lies in their structure, which has a notable impact on the speed of the computation which is time consuming with huge database. In this paper, we show how to use the BSP (Bulk Synchronous parallel) programming model to improve the efficiency of sequential pattern sampling methods. Indeed, we propose a parallel algorithm that operates on sequential databases that are knowingly distributed in order to accelerate the computation time. The analyses show the positive impact of the framework on the execution time of the method.

关键词： parallel programming Conferences Computational modeling Distributed databases Big Data Sampling methods Complexity theory

来源：评论

学校读者我要写书评

暂无评论

An efficient hybrid MPI/OpenMP parallelization of the asynchronous ADMM algorithm

An efficient hybrid MPI/OpenMP parallelization of the asynch...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Qinnan Qiu Yongmei Lei Dongxia Wang Guozheng Wang School of Computer Engineering and Science Shanghai University Shanghai China

Alternating direction method of multipliers (ADMM) is an efficient algorithm to solve large-scale machine learning problems in a distributed environment. To make full use of the hierarchical memory model in modern high-performance computing systems, this paper implements a hybrid MPI/OpenMP parallelization of the asynchronous ADMM algorithm (AH-ADMM). The AH-ADMM algorithm updates local variables in parallel by OpenMP threads and exchanges information between MPI processes, which relieves memory and communication pressure by replacing multi-processing with multi-threading. Furthermore, for the SVM problem, the AH-ADMM algorithm speeds up the calculation of sub-problems through an efficient parallel optimization strategy. This paper effectively combines the features of both algorithm design and programming model. Experiments on the Ziqiang4000 high-performance cluster demonstrate that the AH-ADMM algorithm scales better and run faster than the existing distributed ADMM algorithms implemented by pure MPI. The AH-ADMM can reduce the communication overhead by up to 91.8% and increase the convergence rate by up to 36x. For large datasets, the AH-ADMM can scale well on the cluster which over 129 cores.

关键词： Support vector machines Machine learning algorithms parallel programming Multicore processing Scalability Clustering algorithms Machine learning

来源：评论

学校读者我要写书评

暂无评论

From matching logic to parallel imperative language verification

arXiv

引用

arXiv 2021年

作者： Wang, ShangBei Nanjing University of Aeronautics and Astronautics Nanjing China

Program verification is to develop the program’s proof system, and to prove the proof system soundness with respect to a trusted operational semantics of the program. However, many practical program verifiers are not based on operational semantics and can’t seriously validate the program. Matching logic is proposed to make program verification based on operational semantics. In this paper, following Grigore Roşu et al’s work, we consider matching logic for parallel imperative language(PIMP). According to our investigation, this paper is the first study on matching logic for PIMP. In our matching logic, we redefine "interference-free" to character parallel rule and prove the soundness of matching logic to the operational semantics of PIMP. We also link PIMP’s operational semantics and PIMP’s verification formally by constructing a matching logic verifier for PIMP which executes rewriting logic semantics symbolically on configuration patterns and is sound and complete to matching logic for PIMP. That is our matching logic verifier for PIMP is sound to the operational semantics of PIMP. Finally, we also verify the matching logic verifier through an example which is a standard problem in parallel programming. Copyright © 2021, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

The parallel optimization based on the PVS algorithm and research on the evaluation function in the Game of the Amazons

The parallel optimization based on the PVS algorithm and res...

引用

第33届中国控制与决策会议

作者： Haoyu Wang Hongkun Qiu School of Computer Science Shenyang Aerospace University Engineering Training Center Shenyang Aerospace University

The PVS search function,as a current mainstream and efficient algorithm,has been widely used in various kinds of chess *** applied the parallel search function based on the PVS and improved the running speed of the *** the same time,we also did some research and experiments on the evaluation function of Amazon chess which provided a set of available Amazon evaluation functions and parameter adjustment results for reference.

关键词： Computer Game Amazon Game PVS parallel programming Evaluation Function

来源：评论

学校读者我要写书评

暂无评论

Cash flow prediction of a bank deposit using scalable graph analysis and machine learning

Cash flow prediction of a bank deposit using scalable graph ...

引用

IEEE International Conference on Big Data

作者： Ryo Kawahara Mikio Takeuchi IBM Research - Tokyo IBM Japan Tokyo Japan

ISBN: (纸本)9781665445993

Cash flow prediction of a bank is an important task as it is not only related to liquidity risk but is also regulated by financial authorities. To improve the prediction, a graph analysis of bank transaction data is promising, while its size, scale-free nature, and various attributes make the task *** this paper, we propose a graph-based machine learning method for the cash flow prediction t ask. Our contributions are as follows. (i) We introduce an extensible and scalable shared-memory parallel graph analysis platform that supports the vertex-centric, bulk synchronous parallel programming paradigm. (ii) We introduce two novel graph features upon the platform: (ii-a) an internal money flow feature based on the Markov process approximation, and (ii-b) an anomaly score feature derived from other graph *** proposed method is examined with real bank transaction data. The proposed graph features reduce the error of a long-term (31-day) cash flow prediction by 56 % from that of a non-graph-based time-series prediction model. The graph analysis platform can compute graph features from a graph with 10 × 10 6 nodes and 593 × 10 6 edges in 2 hours 20 minutes.

关键词： parallel programming Conferences Computational modeling Machine learning Big Data Predictive models Markov processes

来源：评论

学校读者我要写书评

暂无评论

DISENTANGLING parallelISM AND INTERFERENCE IN GAME SEMANTICS

arXiv

引用

arXiv 2021年

作者： Castellan, Simon Clairambault, Pierre Inria Univ Rennes CNRS IRISA France CNRS Aix Marseille Univ LIS Marseille France

Game semantics is a denotational semantics presenting compositionally the computational behaviour of various kinds of effectful programs. One of its celebrated achievement is to have obtained full abstraction results for programming languages with a variety of computational effects, in a single framework. This is known as the semantic cube or Abramsky’s cube, which for sequential deterministic programs establishes a correspondence between certain conditions on strategies ("innocence", "well-bracketing", "visibility") and the absence of matching computational effects. Outside of the sequential deterministic realm, there are still a wealth of game semantics-based full abstraction results;but they no longer fit in a unified canvas. In particular, Ghica and Murawski’s fully abstract model for shared state concurrency (IA) does not have a matching notion of pure parallel program – we say that parallelism and interference (i.e. state plus semaphores) are entangled. In this paper we construct a causal version of Ghica and Murawski’s model, also fully abstract for IA . We provide compositional conditions parallel innocence and sequentiality, respectively banning interference and parallelism, and leading to four full abstraction results. To our knowledge, this is the first extension of Abramsky’s semantic cube programme beyond the sequential deterministic world. © 2021, CC BY.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：