It's common to see specialized language constructs in modern task-based programming systems for reasoning about groups of indepen-dent tasks intended for parallel execution. However, most systems use an ad-hoc rep...
详细信息
ISBN:
(数字)9781450384421
ISBN:
(纸本)9781665483902
It's common to see specialized language constructs in modern task-based programming systems for reasoning about groups of indepen-dent tasks intended for parallel execution. However, most systems use an ad-hoc representation that limits expressiveness and of-ten overfits for a given application domain. We introduce index launches, a scalable and flexible representation of a group of tasks. Index launches use a flexible mechanism to indicate the data required for a given task, allowing them to be used for a much broader set of use cases while maintaining an efficient representation. We present a hybrid design for index launches, involving static and dynamic program analyses, along with a characterization of how they're used in Legion and Regent, and show how they generalize constructs found in other task-based systems. Finally, we present re-sults of scaling experiments which demonstrate that index launches are crucial for the efficient distributed execution of several scientific codes in Regent.
Nowadays, special processors are widely used in digital processing of signals. Analog devices are one of the largest companies producing custom processors. This article compares and analyzes the functional description...
详细信息
ISBN:
(纸本)9781665432597
Nowadays, special processors are widely used in digital processing of signals. Analog devices are one of the largest companies producing custom processors. This article compares and analyzes the functional descriptions of Analog Devices' Blackfin dual-core processor architecture. The architectures of the ADSP-BF561 and ADSP-BF60x processors are described, their capabilities and areas of application are described. The peripherals of single-core and dual-core processors are compared in the table. The core architecture of Blackfin processors is given and its constituents are described. The hierarchical memory architecture of dual-core Blackfin processors is given. Also included are tables comparing the internal memory of the Blackfin processor family, first-class L1 memory in the core. The internal memories of ADSP-BF561 and ADSP-BF60x Blackfin processors are compared.
A low-cap power budget is challenging for exascale computing. Dy-namic Voltage and Frequency Scaling (DVFS) and Uncore Frequency Scaling (UFS) are the two widely used techniques for limiting the HPC application's ...
详细信息
ISBN:
(数字)9781450384421
ISBN:
(纸本)9781665483902
A low-cap power budget is challenging for exascale computing. Dy-namic Voltage and Frequency Scaling (DVFS) and Uncore Frequency Scaling (UFS) are the two widely used techniques for limiting the HPC application's energy footprint. However, existing approaches fail to provide a unified solution that can work with different types of parallel programming models and applications. This paper proposes Cuttlefish, a programming model oblivious C/C++ library for achieving energy efficiency in multicore parallel programs running over Intel processors. An online profiler periodi-cally profiles model-specific registers to discover a running appli-cation's memory access pattern. Using a combination of DVFS and UFS, Cuttlefish then dynamically adapts the processor's core and uncore frequencies, thereby improving its energy efficiency. The evaluation on a 20-core Intel Xeon processor using a set of widely used OpenMP benchmarks, consisting of several irregular-tasking and work-sharing pragmas, achieves geometric mean energy savings of 19.4% with a 3.6% slowdown.
Sharing data among asynchronous processes is considered to be a hard systems problem in multithreaded modern shared-memory multicore systems. Throughout the literature, multiple solutions have been proposed, like the ...
详细信息
ISBN:
(纸本)9781665458429
Sharing data among asynchronous processes is considered to be a hard systems problem in multithreaded modern shared-memory multicore systems. Throughout the literature, multiple solutions have been proposed, like the so-called barrier synchronization. A Barrier is a synchronization primitive that provides guarantees that any thread will not continue execution from a given point until all threads have reached that point. This primitive is widely used in different parallel programming models, but it can easily become a hot-spot for performance critical applications due to its global nature as one preempted thread will stop execution of all other threads waiting at the barrier. This paper suggests a technique to change the global nature of barrier synchronization into a non-blocking synchronization model with lock-free thread progression guarantees. The main idea is to exploit algorithm-based memory access patterns to implement self-synchronizable threads to protect concurrent reads and writes in a shared data structure without explicit use of a barrier primitive. To the best of our knowledge, this is the first attempt to provide a different synchronization mechanism based on the algorithm intrinsic characteristics rather than an explicit use of a global barrier in shared-memory architectures. Our experimental results show factors of performance improvement against its global barrier-based algorithm counterpart.
In the last years, the field of data mining has undergone extensive work on patterns discovery by sampling techniques. Recently, these sampling methods have been applied to sequential data that are complex in nature. ...
详细信息
In the last years, the field of data mining has undergone extensive work on patterns discovery by sampling techniques. Recently, these sampling methods have been applied to sequential data that are complex in nature. The complexity of these data lies in their structure, which has a notable impact on the speed of the computation which is time consuming with huge database. In this paper, we show how to use the BSP (Bulk Synchronous parallel) programming model to improve the efficiency of sequential pattern sampling methods. Indeed, we propose a parallel algorithm that operates on sequential databases that are knowingly distributed in order to accelerate the computation time. The analyses show the positive impact of the framework on the execution time of the method.
Alternating direction method of multipliers (ADMM) is an efficient algorithm to solve large-scale machine learning problems in a distributed environment. To make full use of the hierarchical memory model in modern hig...
详细信息
Alternating direction method of multipliers (ADMM) is an efficient algorithm to solve large-scale machine learning problems in a distributed environment. To make full use of the hierarchical memory model in modern high-performance computing systems, this paper implements a hybrid MPI/OpenMP parallelization of the asynchronous ADMM algorithm (AH-ADMM). The AH-ADMM algorithm updates local variables in parallel by OpenMP threads and exchanges information between MPI processes, which relieves memory and communication pressure by replacing multi-processing with multi-threading. Furthermore, for the SVM problem, the AH-ADMM algorithm speeds up the calculation of sub-problems through an efficient parallel optimization strategy. This paper effectively combines the features of both algorithm design and programming model. Experiments on the Ziqiang4000 high-performance cluster demonstrate that the AH-ADMM algorithm scales better and run faster than the existing distributed ADMM algorithms implemented by pure MPI. The AH-ADMM can reduce the communication overhead by up to 91.8% and increase the convergence rate by up to 36x. For large datasets, the AH-ADMM can scale well on the cluster which over 129 cores.
Program verification is to develop the program’s proof system, and to prove the proof system soundness with respect to a trusted operational semantics of the program. However, many practical program verifiers are not...
详细信息
The PVS search function,as a current mainstream and efficient algorithm,has been widely used in various kinds of chess *** applied the parallel search function based on the PVS and improved the running speed of the **...
详细信息
The PVS search function,as a current mainstream and efficient algorithm,has been widely used in various kinds of chess *** applied the parallel search function based on the PVS and improved the running speed of the *** the same time,we also did some research and experiments on the evaluation function of Amazon chess which provided a set of available Amazon evaluation functions and parameter adjustment results for reference.
Cash flow prediction of a bank is an important task as it is not only related to liquidity risk but is also regulated by financial authorities. To improve the prediction, a graph analysis of bank transaction data is p...
详细信息
ISBN:
(纸本)9781665445993
Cash flow prediction of a bank is an important task as it is not only related to liquidity risk but is also regulated by financial authorities. To improve the prediction, a graph analysis of bank transaction data is promising, while its size, scale-free nature, and various attributes make the task *** this paper, we propose a graph-based machine learning method for the cash flow prediction t ask. Our contributions are as follows. (i) We introduce an extensible and scalable shared-memory parallel graph analysis platform that supports the vertex-centric, bulk synchronous parallel programming paradigm. (ii) We introduce two novel graph features upon the platform: (ii-a) an internal money flow feature based on the Markov process approximation, and (ii-b) an anomaly score feature derived from other graph *** proposed method is examined with real bank transaction data. The proposed graph features reduce the error of a long-term (31-day) cash flow prediction by 56 % from that of a non-graph-based time-series prediction model. The graph analysis platform can compute graph features from a graph with 10 × 10 6 nodes and 593 × 10 6 edges in 2 hours 20 minutes.
Game semantics is a denotational semantics presenting compositionally the computational behaviour of various kinds of effectful programs. One of its celebrated achievement is to have obtained full abstraction results ...
详细信息
暂无评论