检索结果-内蒙古大学图书馆

26th International Conference on High Performance Computing, Data and Analytics (HiPCW)

作者： Maronas, Marcos Sala, Kevin Mateo, Sergi Ayguade, Eduard Beltran, Vicenc Barcelona Supercomp Ctr BSC Barcelona Spain Univ Politecn Catalunya UPC Barcelona Supercomp Ctr BSC Barcelona Spain

ISBN: (纸本)9781728145358

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism;while the latter relies on fine-grained synchronization among tasks and a flexible data-flow execution model to exploit dynamic, irregular, and nested parallelism. On applications that show both structured and unstructured parallelism, both worksharing and task constructs can be combined. However, it is difficult to mix both execution models without penalizing the data-flow execution model. Hence, on many applications structured parallelism is also exploited using tasks to leverage the full benefits of a pure data-flow execution model. However, task creation and management might introduce a non-negligible overhead that prevents the efficient exploitation of fine-grained structured parallelism, especially on many-core processors. In this work, we propose worksharing tasks. These are tasks that internally leverage worksharing techniques to exploit fine-grained structured loop-based parallelism. The evaluation shows promising results on several benchmarks and platforms.

关键词： fine grained loop parallelism programming models runtime systems

来源：评论

学校读者我要写书评

暂无评论

Evaluation of the Global Address Space programming Interface (GASPI) 28

Evaluation of the Global Address Space Programming Interface...

引用

28th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

作者： Breitbart, Jens Schmidtobreick, Mareike Heuveline, Vincent Tech Univ Munich Lehrstuhl Rechnertech & Rechnerorg Parallelrechne Munich Germany Heidelberg Univ Engn Math & Comp Lab Heidelberg Germany

ISBN: (纸本)9781479941162

The first exascale supercomputers are expected by the end of this decade and will presumably feature an increase in core count, but a decrease in the amount of memory available per core. As of now, it is still unclear if the current programming models will provide high performance on exascale systems. One programming model considered to be an alternative to MPI is the so-called partitioned global address space (PGAS) model. Within this paper we evaluate a relatively new PGAS API: the Global Address Space programming Interface (GASPI) and compare it to MPI on the basis of microbenchmarks. These benchmarks show that GASPI provides about the same level of performance for single-threaded communication, but is up to an order of magnitude faster than both Intel and IBM MPI for multi-threaded communication. Hereafter, we discuss the different features of GASPI in comparison to two main PGAS languages, namely UPC and CAF. In addition, we present a basic numerical algorithm, a dense matrix-matrix multiplication, as an example on how an implementation can make efficient use of GASPI's features, especially the asynchronous and one-sided communication mechanisms.

关键词： PGAS one-sided communication programming models GASPI

来源：评论

学校读者我要写书评

暂无评论

Synthesis of Approximate Parametric Circuits for Variational Quantum Algorithms 5

Synthesis of Approximate Parametric Circuits for Variational...

引用

2024 International Conference on Quantum Computing and Engineering

作者： Burgstahler, Blake Wilson, Ellis Pakin, Scott Mueller, Frank North Carolina State Univ Raleigh NC 27695 USA Los Alamos Natl Lab Los Alamos NM USA

ISBN: (纸本)9798331541378

This work presents a novel approach to synthesize approximate circuits for the ansatze of variational quantum algorithms (VQA) and demonstrates its effectiveness in the context of solving integer linear programming (ILP) problems. Synthesis is generalized to produce parametric circuits in close approximation of the original circuit and to do so offline. This removes synthesis from the (online) critical path between repeated quantum circuit executions of VQA. We hypothesize that this approach will yield novel high fidelity results beyond those discovered by the baseline without synthesis. Simulation and real device experiments complement the baseline in finding correct results in many cases where the baseline fails to find any and do so with on average 32% fewer CNOTs in circuits.

关键词： circuit synthesis circuit-model quantum computing programming models QAOA VQA

来源：评论

学校读者我要写书评

暂无评论

THE EXACT LIKELIHOOD FUNCTION FOR AN EMPIRICAL JOB SEARCH MODEL

引用

ECONOMETRIC THEORY 1991年第4期7卷 464-486页

作者： CHRISTENSEN, BJ KIEFER, NM CORNELL UNIV ITHACANY 14853

The exact likelihood function for a prototypal job search model is analyzed. The optimality condition implied by the dynamic programming framework is fully imposed. Using the optimality condition allows identification of an offer arrival probability separately from an offer acceptance probability. The estimation problem is nonstandard. The geometry of the likelihood function in finite samples is considered, along with asymptotic properties of the maximum likelihood estimator.

关键词： Economic models Maximum likelihood estimation Mathematical theorems Dynamic programming Job hunting programming models Dynamic modeling Wages Unemployment

来源：评论

学校读者我要写书评

暂无评论

DYNAMIC COUPLING, OPTIMIZING AND REGIONAL INTERDEPENDENCE

引用

JOURNAL OF FARM ECONOMICS 1964年第2期46卷 442-451页

作者： DAY, RH University of Wisconsin

来源：评论

学校读者我要写书评

暂无评论

Addressing Global Data Dependencies in Heterogeneous Asynchronous Runtime Systems on GPUs 3

Addressing Global Data Dependencies in Heterogeneous Asynchr...

引用

3rd IEEE International Workshop on Extreme Scale programming models and Middleware (ESPM2)

作者： Peterson, Brad Humphrey, Alan Schmidt, John Berzins, Martin Univ Utah Sci Comp & Imaging Inst Salt Lake City UT 84112 USA

ISBN: (纸本)9781450351331

Large-scale parallel applications with complex global data dependencies beyond those of reductions pose significant scalability challenges in an asynchronous runtime system. Internodal challenges include identifying the all-to-all communication of data dependencies among the nodes. Intranodal challenges include gathering together these data dependencies into usable data objects while avoiding data duplication. This paper addresses these challenges within the context of a large-scale, industrial coal boiler simulation using the Uintah asynchronous many-task runtime system on GPU architectures. We show significant reduction in time spent analyzing data dependencies through refinements in our dependency search algorithm. Multiple task graphs are used to eliminate subsequent analysis when task graphs change in predictable and repeatable ways. Using a combined data store and task scheduler redesign reduces data dependency duplication ensuring that problems fit within host and GPU memory. These modifications did not require any changes to application code or sweeping changes to the Uintah runtime system. We report results running on the DOE Titan system on 119K CPU cores and 7.5K GPUs simultaneously. Our solutions can be generalized to other task dependency problems with global dependencies among thousands of nodes which must be processed efficiently at large scale.

关键词： Data dependencies Asynchronous Many-Task programming models Runtime Systems Scalability GPU Uintah Coal Boiler Radiative Heat Transfer

来源：评论

学校读者我要写书评

暂无评论

Cutty: Aggregate Sharing for User-Defined Windows 16

Cutty: Aggregate Sharing for User-Defined Windows

引用

25th ACM International Conference on Information and Knowledge Management (CIKM)

作者： Carbone, Paris Traub, Jonas Katsifodimos, Asterios Haridi, Seif Markl, Volker KTH Royal Inst Technol Stockholm Sweden Tech Univ Berlin Berlin Germany DFKI Saarbrucken Germany

ISBN: (纸本)9781450340731

Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user-defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all. In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.

关键词： data structures data stream aggregation programming models operator sharing data stream processing data streams data stream optimisation data stream windows functional programming user-defined functions databases

来源：评论

学校读者我要写书评

暂无评论

Towards High Performance Resilience Using Performance Portable Abstractions 27th

Towards High Performance Resilience Using Performance Portab...

引用

27th International European Conference on Parallel and Distributed Computing (Euro-Par)

作者： Morales, Nicolas Teranishi, Keita Nicolae, Bogdan Trott, Christian Cappello, Franck Sandia Natl Labs Livermore CA 94550 USA Sandia Natl Labs POB 5800 Albuquerque NM 87185 USA Argonne Natl Lab Chicago IL USA

ISBN: (纸本)9783030856656;9783030856649

In the drive towards Exascale, the extreme heterogeneity of supercomputers at all levels places a major development burden on HPC applications. To this end, performance portable abstractions such as those advocated by Kokkos, RAJA and HPX are becoming increasingly popular. At the same time, the unprecedented scalability requirements of such heterogeneous components means higher failure rates, motivating the need for resilience in systems and applications. Unfortunately, state-of-art resilience techniques based on checkpoint/restart are lagging behind performance portability efforts: users still need to capture consistent states manually, which introduces the need for fine-tuning and customization. In this paper we aim to close this gap by introducing a set of abstractions that make it easier for the application developers to reason about resilience. To this end, we extend the existing abstractions proposed by performance portability efforts towards resilience. By marking critical data structures that need to be checkpointed, one can enable an optimized runtime to automate checkpoint-restart using high performance and scalable asynchronously techniques. We illustrate the feasibility of our proposal using a prototype that combines the Kokkos runtime (HPC performance portability), with the VELOC runtime (large-scale low overhead checkpoint-restart). Our experimental results show negligible performance overhead compared with a manually tuned implementation of checkpoint-restart while requiring minimal changes in the application code.

关键词： Performance portability Resilience Fault tolerance Checkpointing programming models

来源：评论

学校读者我要写书评

暂无评论

WCSim: A Cloud Computing Simulator with Support for Bag of Tasks Workflows 35

WCSim: A Cloud Computing Simulator with Support for Bag of T...

引用

35th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

作者： dos Santos, Maicon Anca Grabher, Gabriel J. A. Kovaleski, Matheus F. Geyer, Claudio F. R. Cavalheiro, Gerson Geraldo H. Univ Fed Pelotas PPGC Pelotas RS Brazil Univ Grenoble Alpes LIG Grenoble France Univ Fed Rio Grande do Sul II Porto Alegre RS Brazil

ISBN: (纸本)9798350305487

In this paper, we present WCSim, Workflow Cloud Simulator. Firstly, we argue that this cloud simulation tool offers a high level of accessibility by allowing the description of various components, such as users, infrastructures, and workload, of a given scenario simply by providing parameters at launch time, without requiring the extension of the simulator code. Then, we explain how we conceived the components for the simulation models and provide a detailed description of the implemented software. Additionally, we compare the results of a small scenario obtained from two other simulation tools with those provided by WCSim. Finally, we present a case study that illustrates the usage of WCSim. The paper also introduces the a abstraction to model workflows as a Direct Acyclic Graph of Bag of Tasks.

关键词： Cloud Simulators Simulation Cloud Computing programming models Performance Evaluation

来源：评论

学校读者我要写书评

暂无评论

A scalable unified model for dynamic data structures in message passing (clusters) and shared memory (multicore CPUs) computing environments 18

A scalable unified model for dynamic data structures in mess...

引用

18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

作者： Laccetti, Giuliano Lapegna, Marco Montella, Raffaele Univ Napoli Federico II Dept Math & Applicat Naples Italy Univ Napoli Parthenope Dept Sci & Technol Naples Italy

ISBN: (纸本)9781538658154

Concurrent data structures are widely used in many software stack levels, ranging from high level parallel scientific applications to low level operating systems. The key issue of these objects is their concurrent use by several computing units (threads or process) so that the design of these structures is much more difficult compared to their sequential counterpart, because of their extremely dynamic nature requiring protocols to ensure data consistency, with a significant cost overhead. At this regard, several studies emphasize a tension between the needs of sequential correctness of the concurrent data structures and scalability of the algorithms, and in many cases it is evident the need to rethink the data structure design, using approaches based on randomization and/or redistribution techniques in order to fully exploit the computational power of the recent computing environments. The problem is grown in importance with the new generation High Performance Computing systems aimed to achieve extreme performance. It is easy to observe that such systems are based on heterogeneous architectures integrating several independent nodes in the form of clusters or MPP systems, where each node is composed by powerful computing elements (CPU core, GPUs or other acceleration devices) sharing resources in a single node. These systems therefore make massive use of communication libraries to exchange data among the nodes, as well as other tools for the management of the shared resources inside a single node. For such a reason, the development of algorithms and scientific software for dynamic data structures on these heterogeneous systems implies a suitable combination of several methodologies and tools to deal with the different kinds of parallelism corresponding to each specific device, so that to be aware of the underlying platform. The present work is aimed to introduce a scalable model to manage a special class of dynamic data structure known as heap based priority queue (o

关键词： HPC heterogeneous systems concurrent data structures programming models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：