检索结果-内蒙古大学图书馆

7th International Symposium on computing and Networking (CANDAR)

作者： Schmid, Robert Plauth, Max Wenzel, Lukas Eberhardt, Felix Polze, Andreas Univ Potsdam Hasso Plattner Inst Digital Engn Operating Syst & Middleware Grp Potsdam Germany

ISBN: (纸本)9781728152684

near-data accelerators play an important role in satisfying the ever growing demand for compute resources. However, for an efficient integration of near-data computing resources into applications, a flexible programming model and suitable abstractions on the operating system level are required. This paper presents Metal FS, a framework that enables users and applications to orchestrate computations on a NVMe+FPGA near-data computing device through standard shell syntax, including the pipe operator. A user-space NVMe file system interface exposes the storage resources of the NVMe+FPGA accelerator. Computation pipelines expressed on the shell are mapped to pre-defined functional elements of a coarse-grained FPGA overlay, enabling data transformations to be performed in proximity to the data source. Overall, Metal FS greatly increases developer productivity for applications targeting near-data computing accelerators.

关键词： near-data computing FPGA Programming Model

来源：评论

学校读者我要写书评

暂无评论

A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions 22

A Compiler for Automatic Selection of Suitable Processing-in...

引用

22nd Design, Automation and Test in Europe Conference and Exhibition (DATE)

作者： Ahmed, Hameeza Santos, Paulo C. Lima, Joao P. C. Moura, Rafael F. Alves, Marco A. Z. Beck, Antonio C. S. Carro, Luigi NED Univ Dept Comp & Informat Syst Engn Karachi Pakistan Univ Fed Rio Grande do Sul Inst Informat Porto Alegre RS Brazil Univ Fed Parana Dept Informat Curitiba Parana Brazil

ISBN: (纸本)9783981926323

Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38x for specific kernels, while on average achieves 11.8x for a set of benchmarks from PolyBench Suite.

关键词： Compiler Processing in Memory near-data computing Vector instructions SIMD 3D-Stacked memories

来源：评论

学校读者我要写书评

暂无评论

On the Effects of data-Aware Allocation on Fully Distributed Storage Systems for Exascale 1

引用

23rd International Conference on Parallel and Distributed computing (Euro-Par)

作者： Pascual, Jose A. Concatto, Caroline Lant, Joshua Navaridas, Javier Univ Manchester Comp Sci Sch Manchester Lancs England

ISBN: (数字)9783319751788

ISBN: (纸本)9783319751788;9783319751771

The convergence between computing- and data-centric workloads and platforms is imposing new challenges on how to best use the resources of modern computing systems. In this paper we show the need of enhancing system schedulers to differentiate between compute- and data-oriented applications to minimise interferences between storage and application traffic. These interferences can be especially harmful in systems featuring fully distributed storage systems together with unified interconnects, such as our custom-made architecture ExaNeSt. We analyse several data-aware allocation strategies, and found that such strategies are essential to maintain performance in distributed storage systems.

关键词： near-data computing Scheduling Resource allocation

来源：评论

学校读者我要写书评

暂无评论

On the effects of allocation strategies for exascale computing systems with distributed storage and unified interconnects

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2019年第21期31卷

作者： Pascual, Jose A. Lant, Joshua Concatto, Caroline Attwood, Andrew Navaridas, Javier Lujan, Mikel Goodacre, John Univ Manchester Sch Comp Sci Manchester Lancs England

The convergence between computing- and data-centric workloads and platforms is imposing new challenges on how to best use the resources of modern computing systems. In this paper, we investigate alternatives for the storage subsystem of a novel exascale-capable system with special emphasis on how allocation strategies would affect the overall performance. We consider several aspects of data-aware allocation such as the effect of spatial and temporal locality, the affinity of data to storage sources, and the network-level traffic prioritization for different types of flows. In our experimental set-up, temporal locality can have a substantial effect on application runtime (up to a 10% reduction), whereas spatial locality can be even more significant (up to one order of magnitude faster with perfect locality). The use of structured access patterns to the data and the allocation of bandwidth at the network level can also have a significant impact (up to 20% and 17% reduction of runtime, respectively). These results suggest that scheduling policies exposing data-locality information can be essential for the appropriate utilization of future large-scale systems. Finally, we found that the distributed storage system we are implementing can outperform traditional SAN architectures, even with a much smaller (in terms of I/O servers) back-end.

关键词： inter-processor communications near-data computing resource allocation scheduling storage traffic

来源：评论

学校读者我要写书评

暂无评论

Blurring the Lines between Memory and Computation

引用

IEEE MICRO 2017年第6期37卷 13-15页

作者： Das, Reetuparna Univ Michigan Elect Engn & Comp Sci Dept Ann Arbor MI 48109 USA

Traditionally, researchers have attempted to address the memory wall by building a deep memory hierarchy. Another solution is to move computation closer to memory, which is often referred to as processing in memory (PIM). Past PIM solutions tried to move computing logic near memory by integrating DRAM with a logic die using 3D stacking. This helps reduce data movement energy and increase bandwidth; however, the functionality and design of memory itself remains unchanged. An even more exciting technology is one that dissolves the line that distinguishes memory from computational units. nearly three-fourths of silicon in processor and main memory dies is simply to store and access data. Harnessing this silicon area by repurposing it to perform computation can lead to massively parallel computational processing. Furthermore, we naturally save the vast amounts of energy spent in shuffling data back and forth between computational and storage units, and memory bandwidth becomes a meaningless metric. [ABSTRACT FROM AUTHOR]

关键词： accelerators cache energy efficiency memory near-data computing

来源：评论

学校读者我要写书评

暂无评论

data Movement Aware Computation Partitioning 17

Data Movement Aware Computation Partitioning

引用

50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

作者： Tang, Xulong Kislal, Orhan Kandemir, Mahmut Karakoy, Mustafa Penn State Univ University Pk PA 16802 USA TOBB Univ Econ & Technol Ankara Turkey

ISBN: (纸本)9781450349529

data access costs dominate the execution times of most parallel applications and they are expected to be even more important in the future. To address this, recent research has focused on near data Processing (NDP) as a new paradigm that tries to bring computation to data, instead of bringing data to computation (which is the norm in conventional computing). This paper explores the potential of compiler support in exploiting NDP in the context of emerging manycore systems. To that end, we propose a novel compiler algorithm that partitions the computations in a given loop nest into subcomputations and schedules the resulting subcomputations on different cores with the goal of reducing the distance-to-data on the on-chip network. An important characteristic of our approach is that it exploits NDP while taking advantage of data locality. Our experiments with 12 multithreaded applications running on a state-of-the-art commercial manycore system indicate that the proposed compiler-based approach significantly reduces data movements on the on-chip network by taking advantage of NDP, and these benefits lead to an average execution time improvement of 18.4%.

关键词： Multicore Architectures near-data computing Compiler

来源：评论

学校读者我要写书评

暂无评论

A Virtual File System for On-Demand Processing of Multidimensional datasets 16

A Virtual File System for On-Demand Processing of Multidimen...

引用

Conference on Diversity, Big data, and Science at Scale (XSEDE)

作者： Wetzel, Arthur Bakal, Jennifer Dittrich, Markus Carnegie Mellon Univ Pittsburgh Supercomp Ctr Pittsburgh PA 15213 USA BioTeam Inc Middleton MA 01949 USA

ISBN: (纸本)9781450347556

Diverse areas of science and engineering are increasingly driven by high-throughput automated data capture and analysis. Modern acquisition technologies, used in many scientific applications (e.g., astronomy, physics, materials science, geology, biology, and engineering) and often running at gigabyte per second data rates, quickly generate terabyte to petabyte datasets that must be stored, shared, processed and analyzed at similar rates. The largest datasets are often multidimensional, such as volumetric and time series data derived from various types of image capture. Costeffective and timely processing of these data require system and software architectures that incorporate on-the-fly processing to minimize I/O traffic and avoid latency limitations. In this paper we present the Virtual Volume File System, a new approach to on-demand processing with file system semantics, combining these principles into a versatile and powerful data pipeline for dealing with some of the largest 3D volumetric datasets. We give an example of how we have started to use this approach in our work with massive electron microscopy image stacks. We end with a short discussion of current and future challenges.

关键词： active storage file system multidimensional data processing near-data computing data sharing data duplication hierarchical data storage

来源：评论

学校读者我要写书评

暂无评论

Saving Memory Movements Through Vector Processing in the DRAM

Saving Memory Movements Through Vector Processing in the DRA...

引用

International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

作者： Alves, Marco A. Z. Santos, Paulo C. Moreira, Francis B. Diener, Matthias Carro, Luigi Univ Fed Rio Grande do Sul Inst Informat Porto Alegre RS Brazil

ISBN: (纸本)9781467383202

Despite the ability of modern processors to execute a variety of algorithms efficiently through instructions based on registers with ever-increasing widths, some applications present poor performance due to the limited interconnection bandwidth between main memory and processing units. near-data processing has started to gain acceptance as an accelerator device due to the technology constraints and high costs associated with data transfer. However, previous approaches to near-data computing do not provide general-purpose processing, or require large amounts of logic and do not fully use the potential of the DRAM devices. These issues limited its wide adoption. In this paper, we present the Memory Vector Extensions (MVX), which implement vector instructions directly inside the DRAM devices, therefore avoiding data movement between memory and processing units, while requiring a lower amount of logic than previous approaches. MVX is able to obtain up to 211x increase in performance for application kernels with a high spatial locality and a low temporal locality. Comparing to an embedded processor with 8 cores and 2 memory channels that supports AVX-512 instructions, MVX performs 24x faster on average for three well known algorithms.

关键词： near-data computing data movement vector instructions

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：