Non-volatile memories(NVMs)provide lower latency and higher bandwidth than block ***,NVMs are byte-addressable and provide persistence that can be used as memory-level storage devices(non-volatile main memory,NVMM).Th...
详细信息
Non-volatile memories(NVMs)provide lower latency and higher bandwidth than block ***,NVMs are byte-addressable and provide persistence that can be used as memory-level storage devices(non-volatile main memory,NVMM).These features change storage hierarchy and allow CPU to access persistent data using load/store ***,we can directly build a file system on ***,traditional file systems are designed based on slow block *** use a deep and complex software stack to optimize file system *** design results in software overhead being the dominant factor affecting NVMM file ***,scalability,crash consistency,data protection,and cross-media storage should be reconsidered in NVMM file *** survey existing work on optimizing NVMM file ***,we analyze the problems when directly using traditional file systems on NVMM,including heavy software overhead,limited scalability,inappropriate consistency guarantee techniques,***,we summarize the technique of 30 typical NVMM file systems and analyze their advantages and ***,we provide a few suggestions for designing a high-performance NVMM file system based on real hardware Optane DC persistent memory ***,we suggest applying various techniques to reduce software overheads,improving the scalability of virtual file system(VFS),adopting highly-concurrent data structures(e.g.,lock and index),using memory protection keys(MPK)for data protection,and carefully designing data placement/migration for cross-media file system.
In recent years, the advent of emerging computing applications, such as cloud computing, artificial intelligence, and the Internet of Things, has led to three common requirements in computer system design: high utili...
详细信息
In recent years, the advent of emerging computing applications, such as cloud computing, artificial intelligence, and the Internet of Things, has led to three common requirements in computer system design: high utilization, high throughput, and low latency. Herein, these are referred to as the requirements of 'high-throughput computing (HTC)'. We further propose a new indicator called 'sysentropy' for measuring the degree of chaos and uncertainty within a computer system. We argue that unlike the designs of traditional computing systems that pursue high performance and low power consumption, HTC should aim at achieving low sysentropy. However, from the perspective of computerarchitecture, HTC faces two major challenges that relate to (1) the full exploitation of the application's data parallelism and execution concurrency to achieve high throughput, and (2) the achievement of low latency, even in the cases at which severe contention occurs in data paths with high utilization. To overcome these two challenges, we introduce two techniques: on-chip data flow architecture and labeled von Neumann architecture. We build two prototypes that can achieve high throughput and low latency, thereby significantly reducing sysentropy.
The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in p...
详细信息
The algorithm and its implementation of the leading zero anticipation (LZA) are very vital for the performance of a high-speed floating-point adder in today's state of art microprocessor design. Unfortunately, in predicting "shift amount" by a conventional LZA design, the result could be off by one position. This paper presents a novel parallel error detection algorithm for a general-case LZA. The proposed approach enables parallel execution of conventional LZA and its error detection, so that the error-indicatlon signal can be generated earlier in the stage of normalization, thus reducing the critical path and improving overall performance. The circuit implementation of this algorithm also shows its advantages of area and power compared with other previous work.
1 Introduction Most real-world graphs are large-scale but unstructured and *** of the most notable characteristics of real-world graphs is the skewed power law degree distribution[1]:most vertices have a few neighbors...
详细信息
1 Introduction Most real-world graphs are large-scale but unstructured and *** of the most notable characteristics of real-world graphs is the skewed power law degree distribution[1]:most vertices have a few neighbors while a few own a large number of *** characteristics present challenges for efficient parallel graph processing,such as load imbalance,poor locality,and redundant *** from modifying the graph programming abstraction or changing the execution models on different architectures,reducing the irregularity of graph data also improves the performance of graph processing[2].For example,it is wellknown that BFS has a bad temporal locality,but it is possible to transform irregular graphs to more regular ones to improve spatial locality and gain more performance.
In this paper we present a multi-grained parallel algorithm for computing betweenness centrality, which is extensively used in large-scale network analysis. Our method is based on a novel algorithmic handling of acces...
详细信息
Multicore architecture is becoming a promise to keep Moore's Law and brings a revolution in both research and industry which results new design space for software and architecture. Fast Fourier Transform (FFT), co...
详细信息
Range reduction is important in evaluating trigonometric functions but not enough work is done in relation to the hardware implementation of it. A hardware floating point range reduction implementation is presented. T...
详细信息
Ring is a promising on-chip interconnection for CMP. It is more scalable than bus and much simpler than packet-switched networks. The ordering property of ring can be used to optimize cache coherence protocol design. ...
详细信息
Nine-degrees-of-freedom (9-DoF) object pose and size estimation is crucial for enabling augmented reality and robotic manipulation. Category-level methods have received extensive research attention due to their potent...
详细信息
Library functions and system calls have been major difficulties faced by automatic test. Input/ output (I/O) functions are a set of common library functions. Testers have to interact with the test procedures if the te...
详细信息
暂无评论