检索结果-内蒙古大学图书馆

8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART)

作者： Shelor, Charles Kavi, Krishna Univ North Texas Dept Comp Sci & Engn Denton TX 76203 USA

ISBN: (纸本)9781450353168

The emergence of 3D-DRAM has rekindled interest in near data computing (NDC) research. This article introduces dataflow processing in memory (DFPIM) which melds near data computing, dataflow architecture, coarse-grained reconfigurable logic (CGRL), and 3D-DRAM technologies to provide high performance and very high energy efficiency for stream oriented and big data application kernels. The application of dataflow architecture with a CGRL implementation provides a flexible, energy efficient computing platform. The initial evaluation presented in this paper shows an average speedup of 5.5 is achieved with an energy efficiency factor of 460.

关键词： dataflow Computer Architecture near data computing Processing-in-Memory Coarse Grain Reconfigurable Logic energy efficient computing big-data 3D stacked DRAM

来源：评论

学校读者我要写书评

暂无评论

A novel NVM memory file system for edge intelligence

引用

IEICE ELECTRONICS EXPRESS 2022年第8期19卷 20220079-20220079页

作者： Lu, Junjie Chen, Xiaogang Li, Shunfen Qian, Xinyu Yuemaier, Aximu Song, Zhitang Chinese Acad Sci Shanghai Inst Microsyst & Informat Technol Shanghai Key Lab Nanofabricat Technol Memory Shanghai 200050 Peoples R China ShanghaiTech Univ Shanghai 201210 Peoples R China

Edge intelligence (EI), as a combination of artificial intelligence (AI) and internet of things (IoT), is the key to realize an interconnected world of everything. EI needs to process the data collected by edge devices locally. For edge devices with insufficient computing power, a near data computing (NDC) system can provide local computing capability by adding a co-processing unit near the memory. NDC systems are extensively studied focused on the optimization of the hardware architecture but lack of universal and transparent system support. Meanwhile, we notice that non-volatile memory (NVM) plays an important role in edge intermittent computing. In this article, we propose a novel in-memory processing file system (IMPFS) based on NVM to provide general support for NDC system running on edge devices. IMPFS uses standard file system interfaces and is optimized for NVM and AI application management. To verify the IMPFS, a prototype system is designed. The results showed that the IMPFS can not only provide universal support for EI but also further improve the processing speed by reducing the redundant software overhead.

关键词： edge intelligence near data computing PCM FPGA

来源：评论

学校读者我要写书评

暂无评论

Opportunistic computing in GPU Architectures 19

Opportunistic Computing in GPU Architectures

引用

46th International Symposium on Computer Architecture (ISCA)

作者： Pattnaik, Ashutosh Tang, Xulong Kayiran, Onur Jog, Adwait Mishra, Asit K. Kandemir, Mahmut T. Sivasubramaniam, Anand Das, Chita R. Penn State Univ University Pk PA 16802 USA Adv Micro Devices Inc Santa Clara CA USA Coll William & Mary Williamsburg VA 23187 USA NVIDIA Corp Santa Clara CA USA Penn State State Coll PA USA

ISBN: (纸本)9781450366694

data transfer overhead between computing cores and memory hierarchy has been a persistent issue for von Neumann architectures and the problem has only become more challenging with the emergence of manycore systems. A conceptually powerful approach to mitigate this overhead is to bring the computation closer to data, known as near data computing (NDC). Recently, NDC has been investigated in different flavors for CPU-based multicores, while the GPU domain has received little attention. In this paper, we present a novel NDC solution for GPU architectures with the objective of minimizing on-chip data transfer between the computing cores and Last-Level Cache (LLC). To achieve this, we first identify frequently occurring Load-Compute-Store instruction chains in GPU applications. These chains, when offloaded to a compute unit closer to where the data resides, can significantly reduce data movement. We develop two offloading techniques, called LLC-Compute and Omni-Compute. The first technique, LLC-Compute, augments the LLCs with computational hardware for handling the computation offloaded to them. The second technique (OmniCompute) employs simple bookkeeping hardware to enable GPU cores to compute instructions offloaded by other GPU cores. Our experimental evaluations on nine GPGPU workloads indicate that the LLC-Compute technique provides, on an average, 19% performance improvement (IPC), 11% performance/watt improvement, and 29% reduction in on-chip data movement compared to the baseline GPU design. The Omni-Compute design boosts these benefits to 31%, 16% and 44%, respectively.

关键词： GPU near data computing computation offloading

来源：评论

学校读者我要写书评

暂无评论

Be(-A)ware of data Movement: Optimizing Throughput Processors For Efficient Computations

Be(-A)ware of Data Movement: Optimizing Throughput Processor...

引用

作者： Pattnaik, Ashutosh PennState University Libraries

学位级别：Doctor of Philosophy

General-Purpose Graphics Processing Units (GPGPUs) have become a dominant computing paradigm to accelerate diverse classes of applications primarily because of their higher throughput and better energy efficiency compared to CPUs. Moreover, GPU performance has been rapidly increasing due to technology scaling, increased core count and larger GPU cores. This has made GPUs an ideal substrate for building high performance, energy efficient computing systems. However, in spite of many architectural innovations in designing state-of-the-art GPUs, their deliverable performance falls far short of the achievable performance due to several issues. One of the major impediments to improving performance and energy efficiency of GPUs further is the overheads associated with data movement. The main motivation behind the dissertation is to investigate techniques to mitigate the effects of data movement towards performance on throughput architectures. It consists of three main components. The first part of this dissertation focuses on developing intelligent compute scheduling techniques for GPU architectures with support for processing in memory (PIM) capability. It performs an in-depth kernel-level analysis of GPU applications and develops prediction model for efficient compute scheduling and management between the GPU and the processing in memory enabled memory. The second part of this dissertation focuses on reducing the on-chip data movement footprint via efficient near data computing mechanisms. It identifies the basic forms of instructions that are ideal candidates for offloading and provides the necessary compiler and hardware support to enable offloading computations closer to where the data resides for improving the performance and energy-efficiency. The third part of this dissertation focuses on investigating new warp formation and scheduling mechanisms for GPUs. It identifies code regions that leads to the under-utilization of the GPU core. Specifically, it tackles the c

关键词： GPUs Throughput Processors Compute Scheduling near data computing Processing In Memory

来源：评论

学校读者我要写书评

暂无评论

A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment

引用

IEEE MICRO 2017年第4期37卷 20-28页

作者： Kaplan, Roman Yavits, Leonid Ginosar, Ran Weiser, Uri Technion Israel Inst Technol Dept Elect Engn Haifa Israel Technion Israel Inst Technol Elect Engn Haifa Israel Technion Israel Inst Technol VLSI Syst Res Ctr Haifa Israel

A novel processing-in-storage (PRinS) architecture based on Resistive CAM (ReCAM) is described and proposed for Smith-Waterman (S-W) sequence alignment. The ReCAM PRinS massively parallel compare operation finds matching base pairs in a fixed number of cycles, regardless of sequence length. The ReCAM PRinS S-W algorithm is simulated and compared to FPGA, Xeon Phi, and GPU-based implementations, showing at least 4.7 times higher throughput and at least 15 times lower power dissipation.

关键词： Computer architecture DNA Integrated circuits local sequence alignment Mathematical model near data computing processing in storage Random access memory Registers resistive RAM Systems architecture

来源：评论

学校读者我要写书评

暂无评论

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities 16

Scheduling Techniques for GPU Architectures with Processing-...

引用

International Conference on Parallel Architectures and Compilation (PACT)

作者： Pattnaik, Ashutosh Tang, Xulong Jog, Adwait Kayiran, Onur Mishra, Asit K. Kandemir, Mahmut T. Mutlu, Onur Das, Chita R. Penn State Univ University Pk PA 16802 USA Coll William & Mary Williamsburg VA 23187 USA Adv Micro Devices Inc Sunnyvale CA 94088 USA Intel Labs Sunnyvale CA USA Swiss Fed Inst Technol Zurich Switzerland Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450341219

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly partitioned and scheduled to execute on either the main, powerful GPU cores that are far away from memory or the auxiliary, simple GPU cores that are close to memory (e.g., in the logic layer of 3D-stacked DRAM). This paper investigates two key code scheduling issues in such a GPU architecture that has PIM capabilities, to maximize performance and energy-efficiency: (1) how to automatically identify the code segments, or kernels, to be offloaded to the cores in memory, and (2) how to concurrently schedule multiple kernels on the main GPU cores and the auxiliary GPU cores in memory. We develop two new run-time techniques: (1) a regression-based affinity prediction model and mechanism that accurately identifies which kernels would benefit from PIM and offloads them to GPU cores in memory, and (2) a concurrent kernel management mechanism that uses the affinity prediction model, a new kernel execution time prediction model, and kernel dependency information to decide which kernels to schedule concurrently on main GPU cores and the GPU cores in memory. Our experimental evaluations across 25 GPU applications demonstrate that these two techniques can significantly improve both application performance (by 25% and 42%, respectively, on average) and energy efficiency (by 28% and 27%).

关键词： kernel scheduling gpu processing-in-memory near data computing

来源：评论

学校读者我要写书评

暂无评论

Research on Architecture On-demands for data Intensive Geoscience Product 12

Research on Architecture On-demands for Data Intensive Geosc...

引用

12th IEEE Int Conf Ubiquitous Intelligence & Comp/12th IEEE Int Conf Autonom & Trusted Comp/15th IEEE Int Conf Scalable Comp & Commun & Associated Workshops/IEEE Int Conf Cloud & Big data Comp/IEEE Int Conf Internet People

作者： Zhang, Wanfeng Li, Shengyang Zhao, Lingjun Chinese Acad Sci Key Lab Space Utilizat Technol & Engn Ctr Space Utilizat Beijing Peoples R China Chinese Acad Sci Inst Remote Sensing & Digital Earth Beijing Peoples R China

ISBN: (纸本)9781467372114

To illustrate the architecture on-demands for data intensive geoscience product (AODIGP) processing, this paper proposed a division method of working domains which cover tasks domain, resources domain, and flows domain. Take the remote sensing products in common properties as instance, products requirements would be parsed by means of knowledge base and inference engine. Abstract description method of remote sensing products can be used to constitute the workflow according to near data computing (NDC) ruler. Finally we verified the availability of AODIGP by constructing data center resources cooperatively-scheduled system (DCRCSS) and presented the improvement in the future.

关键词： Architecture On-demands RS Product in Common Properties near data computing Knowledge Base

来源：评论

学校读者我要写书评

暂无评论

SQRL: Hardware Accelerator for Collecting Software data Structures 14

SQRL: Hardware Accelerator for Collecting Software Data Stru...

引用

23rd International Conference on Parallel Architectures and Compilation Techniques (PACT)

作者： Kumar, Snehasish Shriraman, Arrvindh Srinivasan, Vijayalakshmi Lin, Dan Phillips, Jordon Simon Fraser Univ Sch Comp Sci Burnaby BC V5A 1S6 Canada IBM Res Yorktown Hts NY USA

ISBN: (纸本)9781450328098

Software data structures are a critical aspect of emerging data-centric applications which makes it imperative to improve the energy efficiency of data delivery. We propose SQRL, a hardware accelerator that integrates with the last-level-cache (LLC) and enables energy-efficient iterative computation on data structures. SQRL integrates a data structure-specific LLC refill engine (Collector) with a compute array of lightweight processing elements (PEs). The collector exploits knowledge of the compute kernel to i) run ahead of the PEs in a decoupled fashion to gather data objects and ii) throttle fetch rate and adaptively tile the dataset based on the locality characteristics. The collector exploits data structure knowledge to find the memory level parallelism and eliminate data structure instructions.

关键词： near data computing memory level parallelism data structures

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：