检索结果-内蒙古大学图书馆

Proceedings of the 18th USENIX Conference on File and Storage Technologies

作者： Jie Zhang Miryeong Kwon Michael Swift Myoungsoo Jung Computer Architecture and Memory Systems Laboratory Korea Advanced Institute of Science and Technology Computer Architecture and Memory Systems Laboratory University of Wisconsin at Madison

ISBN: (纸本)9781939133120

NVMe is designed to unshackle flash from a traditional storage bus by allowing hosts to employ many threads to achieve higher bandwidth. While NVMe enables users to fully exploit all levels of parallelism offered by modern SSDs, current firmware designs are not scalable and have difficulty in handling a large number of I/O requests in parallel due to its limited computation power and many hardware *** propose DeepFlash, a novel manycore-based storage platform that can process more than a million I/O requests in a second (1MIOPS) while hiding long latencies imposed by its internal flash media. Inspired by a parallel data analysis system, we design the firmware based on many-to-many threading model that can be scaled horizontally. The proposed DeepFlash can extract the maximum performance of the underlying flash memory complex by concurrently executing multiple firmware components across many cores within the device. To show its extreme parallel scalability, we implement DeepFlash on a many-core prototype processor that employs dozens of lightweight cores, analyze new challenges from parallel I/O processing and address the challenges by applying concurrency-aware optimizations. Our comprehensive evaluation reveals that DeepFlash can serve around 4.5 GB/s, while minimizing the CPU demand on microbenchmarks and real server workloads.

关键词：

来源：评论

学校读者我要写书评

暂无评论

DRAM-Less: Hardware Acceleration of Data Processing with New Memory

DRAM-Less: Hardware Acceleration of Data Processing with New...

引用

IEEE Symposium on High-Performance computer architecture

作者： Jie Zhang Gyuyoung Park David Donofrio John Shalf Myoungsoo Jung Computer Architecture and Memory Systems Laboratory Korea Advanced Institute of Science and Technology (KAIST) Computer Architecture and Memory Systems Laboratory Lawrence Berkeley National Laboratory

ISBN: (数字)9781728161495

ISBN: (纸本)9781728161501

General purpose hardware accelerators have become major data processing resources in many computing domains. However, the processing capability of hardware accelerations is often limited by costly software interventions and memory copies to support compulsory data movement between different processors and solid-state drives (SSDs). This in turn also wastes a significant amount of energy in modern accelerated systems. In this work, we propose, DRAM-less, a hardware automation approach that precisely integrates many state-of-the-art phase change memory (PRAM) modules into its data processing network to dramatically reduce unnecessary data copies with a minimum of software modifications. We implement a new memory controller that plugs a real 3x nm multi-partition PRAM to 28nm technology FPGA logic cells and interoperate its design into a real PCIe accelerator emulation platform. The evaluation results reveal that our DRAM-less achieves, on average, 47% better performance than advanced acceleration approaches that use a peer-to-peer DMA.

关键词： Phase change random access memory Data processing Hardware Acceleration Kernel Buffer storage

来源：评论

学校读者我要写书评

暂无评论

Design and implementation of an efficient thread partitioning algorithm 3rd

Design and implementation of an efficient thread partitionin...

引用

3rd International Symposium on High Performance Computing, ISHPC 2000

作者： Amaral, José Nelson Gao, Guang Kocalar, Erturk Dogan O'Neill, Patrick Tang, Xinan Computer Architecture and Parallel Systems Laboratory University of Delaware NewarkDE United States Dep. of Comp. Science Univ. of Alberta Canada

ISBN: (纸本)9783540411284

The development of fine-grain multi-threaded program ex-ecution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A suc-cessful algorithm must produce a thread partition that best utilizes mul-tiple execution units on a single processing node and handles long and unpredictable latencies. In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models. A quantitative aFFInity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality. The algorithm is surpris-ingly simple due to the use of a time-ordered event list to account for the multiple execution unit activities. We have implemented the proposed al-gorithm and our experiments, performed on a wide range of examples, have demonstrated its eFFIciency and effectiveness. © Springer-Verlag Berlin Heidelberg 2000.

关键词： Economic and social effects

来源：评论

学校读者我要写书评

暂无评论

A transparent runtime data distribution engine for OpenMP

引用

Scientific Programming 2000年第3期8卷 143-162页

作者： Nikolopoulos, D.S. Papatheodorou, T.S. Polychronopoulos, C.D. Labarta, J. Ayguade, E. Computer and Systems Research Laboratory University of Illinois at Urbana-Champaign 1308 West Main Street Urbana IL 61801 United States Department of Computer Engineering and Informatics University of Patras GR26500 Patras Greece Department of Computer Architecture Technical University of Catalonia c/Jordi Girona 1-3 08034 Barcelona Spain

This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contemporary NUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution, incur modest performance losses. Second, the paper presents a transparent, user-level page migration engine with an ability to gain back any performance loss that stems from suboptimal placement of pages in iterative OpenMP programs. The main body of the paper describes how our OpenMP runtime environment uses page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results verify the effectiveness of the proposed framework and provide a proof of concept that it is not necessary to introduce data distribution directives in OpenMP and warrant the simplicity or the portability of the programming model.

关键词：

来源：评论

学校读者我要写书评

暂无评论

On Radically Expanding the Landscape of Potential Applications for Automated-Proof Methods

引用

SN computer Science 2021年第4期2卷 259页

作者： Uhlmann, Jeffrey Wang, Jie Department of Electrical Engineering and Computer Science University of Missouri 201 Naka Hall Columbia United States Laboratory for Analysis and Architecture of Systems French National Centre for Scientific Research Toulouse France

In this paper, we examine the potential of optimization-based computer-assisted proof methods to be applied much more widely than commonly recognized by engineers and computer scientists. More specifically, we contend that there are vast opportunities to derive valuable mathematical results and properties that may be narrow in scope, such as in highly specialized engineering control applications, that are presently overlooked, because they have characteristics atypical of those that are conventionally studied in the areas of pure and applied mathematics. As a concrete example, we demonstrate use of sum-of-squares (SOS) optimization for certifying polynomial nonnegativity as a part of a proposed dimension-pinning strategy to prove that the inverse of the relative gain array (RGA) of a d-dimensional positive-definite matrix is doubly stochastic for d≤ 4. However, it is not specifically this result and solution method that are of principal interest in this paper but rather how they illustrate the relevance of optimization-based proof techniques to engineering system design more broadly. We believe that our paper is the first to explicitly emphasize the fundamental distinction between methods that can be applied to prove results/properties over a fixed number of dimensions versus those that hold generally. The latter class of problems is the conventional domain of mathematicians, but the former is what we propose to be a fertile and largely unrecognized class of problems that are amenable to automated-proof technologies, e.g., as we demonstrate using our novel dimension-pinning approach. © 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.

关键词： Automated proofs Automated software methods computer-aided system design computer-assisted proofs Dimension pinning Inverse relative gain array IRGA Relative gain array RGA SOS proofs Sum-of-squares optimization

来源：评论

学校读者我要写书评

暂无评论

Using Mobile Sinks in WSN: Computational Complexity and a Theoretical Bound

引用

电子学报(英文版) 2011年第1期20卷 147-150页

作者： GU Yu JI Yusheng CHEN Hongyang ZHAO Baohua Department of Computer Science University of Science and Technology of China Hefei China State Key Laboratory of Networking and Switching Technology Beijing China Information Systems Architecture Science Research Division NII Tokyo Japan Institute of Industrial Science University of Tokyo Tokyo Japan

In this paper, we study the lifetime op- timization problem in wireless sensor networks using mo- bile sink nodes. This problem is inherently difficult since we need to consider both sink scheduling and data rout- ing. Through a simple case study we develop a novel no- tation named the Placement pattern (PP) to bound traffic patterns with candidate locations. This significantly de- creases the number of elements needed to be scheduled. Based on the PP, we mathematically formulate this opti- mization problem as a Mixed-integer non-linear program- ming (MINLP), which is very tough and time consuming to solve. By proving that the problem is NP-complete, we point out that instead of seeking an optimal algorithm, heuristic algorithms, especially those with performance guarantee, would be much more desirable to develop. Fur- thermore, in order to help identify performance gains of heuristic algorithms proposed in the future, we develop a Linear programming (LP) formulation which serves as an upper bound by adopting a reformulation and relaxation technique.

关键词：无线传感器网络移动接收计算复杂性混合整数非线性规划启发式算法 NP完全问题优化问题 MINLP

来源：评论

学校读者我要写书评

暂无评论

A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

引用

IEEE Transactions on computers 1973年第8期C-22卷 786-793页

作者： Kogge, Peter M. Systems Architecture Department IBM Corporation Owego. N.Y. 13827 United States Department of Electrical Engineering Department of Computer Science Digital Systems Laboratory Stanford University Stanford. Calif. United States

An mth-order recurrence problem is defined as the compu tation of the series x1, X2,…, XN, where xi=fi(xi-1,…, xi-m) for some function fi This paper uses a technique called recursive doubling in an algorithm for solving a large class of recurrence problems on parallel computers such as the Illiac IV. Recursive doubling involves the splitting of the computation of a function into two equally complex subfunctions whose evaluation can be performed simultaneously in two separate processors. Successive splitting of each of these subfunctions spreads the computation over more processors. This algorithm can be applied to any recurrence equation of the form Xi= f(bi, g(ai, xi-1)) where f and g are functions that satisfy certain distributive and associative-like properties. Although this recurrence is first order, all linear mth-order recurrence equations can be cast into this form. Suitable applications include linear recurrence equations, polynomial evaluation, several nonlinear problems, the determination of the maximum or minimum of N numbers, and the solution of tridiagonal linear equations. The resulting algorithm computes the entire series x1, …, XNin time proportional to [log2/N] on a computer with N-fold parallelism. On a serial computer, computation time is proportional to N. Copyright © 1973 by The Institute of Electrical and Electronics Engineers, Inc.

关键词： Program processors Equations computers Mathematical model Parallel algorithms Data mining Polynomials

来源：评论

学校读者我要写书评

暂无评论

Enhancing nanosatellite dependability through autonomous chip-level debug capabilities 29th

Enhancing nanosatellite dependability through autonomous chi...

引用

29th International Conference on architecture of Computing systems, ARCS 2016

作者： Fuchs, Christian M. Dafinger, Nikolaus Langer, Martin Trinitis, Carsten Chair of Space Systems Engineering Faculty of Aerospace Engineering Computer Engineering Laboratory Delft University of Technology Delft Netherlands Institute for Astronautics Chair for Computer Architecture and Organization Technical University Munich Garching Germany

来源：评论

学校读者我要写书评

暂无评论

Recent Advances in Parallel Virtual Machine and Message Passing Interface 1

引用

丛书名： Lecture Notes in computer Science

1000年

作者： Jack Dongarra Emilio Luque Tomàs Margalef

ISBN: (数字)9783540481584

ISBN: (纸本)9783540665496

Parallel Virtual Machine (PVM) and Message Passing Interface (MPI) are the most frequently used tools for programming according to the message passing paradigm, which is considered one of the best ways to develop parallel applications. This volume comprises 67 revised contributions presented at the Sixth European PVM/MPI Users' Group Meeting, which was held in Barcelona, Spain, 26-29 September 1999. The conference was organized by the computer Science Department of the Universitat Autònoma de Barcelona. This conference has been previously held in Liverpool, UK (1998) and Cracow, Poland (1997). The first three conferences were devoted to PVM and were held at the TU Munich, Germany (1996), ENS Lyon, France (1995), and University of Rome (1994). This conference has become a forum for users and developers of PVM, MPI, and other message passing environments. Interaction between those groups has proved to be very useful for developing new ideas in parallel computing and for applying some of those already existent to new practical fields.

关键词： computer System Implementation Processor architectures

来源：评论

学校读者我要写书评

暂无评论

DockerSSD: Containerized In-Storage Processing and Hardware Acceleration for Computational SSDs

DockerSSD: Containerized In-Storage Processing and Hardware ...

引用

IEEE Symposium on High-Performance computer architecture

作者： Donghyun Gouk Miryeong Kwon Hanyeoreum Bae Myoungsoo Jung Computer Architecture and Memory Systems Laboratory KAIST Panmnesia Inc

Processing data in storage is an energy-efficient solution to examine massive datasets. However, a general incarnation of such well-known task-offloading model in a real system is unfortunately unsuccessful due to not only poor performance but also many practical challenges, such as limited processing capabilities and high vulnerabilities at the storage-level. We propose DockerSSD, a fully flexible in-storage processing (ISP) model that can run a variety of applications near flash without their source-level modification. Specifically, it enables lightweight OS-level virtualization in modern SSDs, which allows the storage intelligence to be well harmonized with existing computing environment and makes ISP even faster. Instead of developing a vendor-specific ISP to offload, DockerSSD can reuse existing Docker images, create containers as a self-governing execution object in storage, and process data directly where they are in real-time. To this end, we design a new communication method and virtual firmware that operate together to download Docker images and manage their container execution without a change of the existing storage interface and runtime. We further accelerate ISP and reduce the execution latency by automating container-related network and I/O handling data paths over hardware. Our evaluation shows that DockerSSD is 2.0 × faster than state-of-the-art ISP models for workloads with a high volume of system calls or file accesses. Moreover, it demonstrates a reduction in power and energy consumption by 1.6 × and 2.3 × respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：