检索结果-内蒙古大学图书馆

20th International Middleware Conference Industrial Track (Middleware Industry)

作者： Park, SeongJae Lee, Yunjae Yeom, Heon Y. Seoul Natl Univ Seoul South Korea

ISBN: (纸本)9781450370417

Modern workloads tend to have huge working sets and low locality. Despite this trend, the capacity of DRAM has not been increased enough to accommodate such huge working sets. Therefore, memory management mechanisms optimized for such modern workloads are widely required today. For such optimizations, knowing the data access pattern of given workloads is essential. However, manually extracting such patterns from huge and complex workloads is exhaustive. Worse yet, existing memory access analysis tools incur unacceptably high overheads for unnecessarily detailed analysis results. To mitigate this situation, we introduce a tool that is designed for data access pattern tracing. Two core mechanisms in this tool, a region-based sampling and an adaptive region adjustment, allow users to limit the tracing overhead in a bounded range regardless of the size and complexity of target workloads, while preserving the quality of results. Our empirical evaluations that conducted with 20 realistic workloads show the high quality, low overhead, and a potential use case of this tool.

关键词： data access pattern memory management memory-intensive workloads profiler performance optimization

来源：评论

学校读者我要写书评

暂无评论

A Tool for Characterizing and Succinctly Representing the data access patterns of Applications

A Tool for Characterizing and Succinctly Representing the Da...

引用

IEEE International Symposium on Workload Characterization (IISWC)

作者： Mills, Catherine Snavely, Allan Carrington, Laura Univ Calif San Diego Dept Comp Sci & Engn La Jolla CA 92093 USA San Diego Supercomp Ctr San Diego CA USA

ISBN: (纸本)9781457720642

Application address streams contain a wealth of information that can be used to characterize the behavior of applications. However, the collection and handling of address streams is complicated by their size and the cost of collecting them. We present PSnAP, a compression scheme specifically designed for capturing the fine-grained patterns that occur in well structured, memory intensive, high performance computing applications. PSnAP profiles are human readable and reveal a great deal of information about the application memory behavior. In addition to providing insight to application behavior the profiles can be used to replay a proxy synthetic address stream for analysis. We demonstrate that the synthetic address streams mimic very closely the behavior of the originals.

关键词： Arrays Benchmark testing Decoding High performance computing History Humans Instruments PSnAP address streams behavioural sciences computing compression scheme data access pattern data compression fine-grained pattern high performance computing human read

来源：评论

学校读者我要写书评

暂无评论

A Prefetch-Aware Memory System for data access patterns in Multimedia Applications 15

A Prefetch-Aware Memory System for Data Access Patterns in M...

引用

15th ACM International Conference on Computing Frontiers

作者： Alawneh, Tareq A. Elhossini, Ahmed Tech Univ Berlin Inst Comp Engn & Microelect Berlin Germany Al Azhar Univ Fac Engn Cairo Egypt

ISBN: (纸本)9781450357616

As the speed gap between CPU and external memory widens, memory latency has become the dominant performance bottleneck in modern applications. Closely connected are caches which play an important role in reducing the average memory latency. The way data is accessed strongly influences cache performance. Numerous multimedia algorithms operating on data such as images and videos, perform processing over rectangular regions of pixels. If this and other data access patterns are properly exploited, significant performance improvements can be achieved. This paper proposes a prefetch-aware memory system that exploits 2D, stride and sequential data access patterns in multimedia applications. It aims at reducing the average memory access latency, lowering the number of memory accesses and utilizing the bandwidth efficiently. Our results reveal significant average memory access time (AMAT) reduction of 21.2% when utilizing effectively the proposed approach compared to the baseline in the evaluated workloads.

关键词： DRAM row-buffer conflict bank-level parallelism prefetch-aware data access pattern two-dimensional sequential stride

来源：评论

学校读者我要写书评

暂无评论

Profiling Dynamic data access patterns with Bounded Overhead and Accuracy 4

Profiling Dynamic Data Access Patterns with Bounded Overhead...

引用

IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)

作者： Park, SeongJae Lee, Yunjae Kim, Yoonhee Yeom, Heon Y. Seoul Natl Univ Dept Comp Sci & Engn Seoul South Korea Sookmyung Womens Univ Dept Comp Sci Seoul South Korea

ISBN: (纸本)9781728124063

One common characteristic of modern workloads such as cloud, big data, and machine learning is memory intensiveness. In detail, such workloads tend to have a huge working set and low locality. Especially, the size of working sets is rapidly growing so that cannot be fully accommodated by a DRAM based main memory. Worse yet, the cloud computing systems, which has been pervasive since few decades ago, are continuously reducing the size of DRAM per CPU and encouraging memory overcommitment. Consequently, efficient and effective out-of-core memory management is becoming more important. Though a number of memory management mechanisms for such situations have proposed, manual analysis and optimization are still required for optimal performance of each workload due to the wide variety of data access patterns. However, existing tools for memory access analysis are not appropriate to be used here because those are not designed for extraction of the dynamic data access pattern of modern workloads. When those tools are used for the purpose, those incur unacceptably high overheads for unnecessarily accurate analysis results. To mitigate this situation, we introduce a tool that is designed for the purpose. Basically, the tool employs a memory access tracking technique based on page table entry access bit, which incurs only minimal overhead. It also provides a technique for an effective tradeoff between profiling overheads and accuracy of the output by dynamically adjusting number of tracking regions. By adopting the technique, this tool can control the level of overheads and output accuracy in bounded range that user specified regardless of the size of target workloads. The overhead can be lowered even enough to be used for online target workloads while still providing useful quality of the extracted data access pattern. The main contributions of this paper are: 1) introduce of the data access patterns profiler tool designed for modern memory-intensive workloads, and 2) empirica

关键词： data access pattern memory-intensive workloads profiler performance optimization

来源：评论

学校读者我要写书评

暂无评论

Sparse triangular solves for ILU revisited: data layout crucial to better performance

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2011年第4期25卷 386-391页

作者： Smith, Barry Zhang, Hong IIT Dept Comp Sci Chicago IL 60616 USA Argonne Natl Lab Div Math & Comp Sci Argonne IL 60439 USA

A key to good processor utilization for sparse matrix computations is storing the data in the format that is most conducive to fast access by the memory system. In particular, for sparse matrix triangular solves the traditional compressed sparse matrix format is poor, and minor adjustments to the data structure can increase the processor utilization dramatically. Such adjustments involve storing the L and U factors separately and storing the U rows 'backwards' so that they are accessed in a simple streaming fashion during the triangular solves. Changes to the PETSc libraries to use this modified storage format resulted in over twice the floating-point rate for some matrices. This improvement can be accounted for by a decrease in the cache misses and TLB (transaction lookaside buffer) misses in the modified code.

关键词： sparse triangular solve ILU factorization matrix-vector product data access pattern data layout

来源：评论

学校读者我要写书评

暂无评论

Optimizing Parallel I/O accesses through pattern-Directed and Layout-Aware Replication

引用

IEEE TRANSACTIONS ON COMPUTERS 2020年第2期69卷 212-225页

作者： He, Shuibing Yin, Yanlong Sun, Xian-He Zhang, Xuechen Li, Zongpeng Zhejiang Univ Coll Comp Sci & Technol Hangzhou 310058 Zhejiang Peoples R China Inst Artificial Intelligence Intelligent Comp Syst Res Ctr Zhejiang Lab Hangzhou 311100 Zhejiang Peoples R China IIT Dept Comp Sci Chicago IL 60616 USA Washington State Univ Sch Engn & Comp Sci Vancouver WA 98686 USA Wuhan Univ Sch Comp Sci Wuhan 430072 Hubei Peoples R China

As the performance gap between processors and storage devices keeps increasing, I/O performance becomes a critical bottleneck of modern high-performance computing systems. In this paper, we propose a pattern-directed and layout-aware data replication design, named PDLA, to improve the performance of parallel I/O systems. PDLA includes an HDD-based scheme H-PDLA and an SSD-based scheme S-PDLA. For applications with relatively low I/O concurrency, H-PDLA identifies access patterns of applications and makes a reorganized data replica for each access pattern on HDD-based servers with an optimized data layout. Moreover, to accommodate applications with high I/O concurrency, S-PDLA replicates critical access patterns that can bring performance benefits on SSD-based servers or on HDD-based and SSD-based servers. We have implemented the proposed replication scheme under MPICH2 library on top of OrangeFS file system. Experimental results show that H-PDLA can significantly improve the original parallel I/O system performance and demonstrate the advantages of S-PDLA over H-PDLA.

关键词： Parallel I/O I/O optimization data replication data reorganization data access pattern

来源：评论

学校读者我要写书评

暂无评论

Compiler-Assisted data Distribution and Network Configuration for Chip Multiprocessors

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2012年第11期23卷 2058-2066页

作者： Li, Yong Abousamra, Ahmed Melhem, Rami Jones, Alex K. Univ Pittsburgh Comp Engn Program Pittsburgh PA 15261 USA Univ Pittsburgh Dept Comp Sci Pittsburgh PA 15260 USA

data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in nonuniform cache architectures with distributed cache banks. To mitigate this effect, we use a compiler-based approach to leverage data access locality, choose an optimized data placement and efficiently configure the on-chip network. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multithreaded memory access patterns (MMAPs). At runtime, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. Based on the partition, the communication pattern of the application can be extracted. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted data partitioning approach shows a 20 percent speedup over shared caching and 5 percent speedup over the closest runtime approximation, first touch. By leveraging the communication pattern we can achieve a comparable performance to a system that uses a complex centralized network configuration system at runtime. Thus, our final system saves significant runtime complexity and achieves an 5.1 percent additional speedup through the addition of the reconfigurable network.

关键词： Circuit switching network-on-chip communication data access pattern data partition

来源：评论

学校读者我要写书评

暂无评论

EED: Energy Efficient Disk drive architecture

引用

INFORMATION SCIENCES 2008年第22期178卷 4403-4417页

作者： Deng, Yuhui Wang, Frank Na Helian Cranfield Univ Campus Cambridge Cranfield High Performance Comp Facil Ctr Grid Comp Cranfield MK43 0AL Beds England Univ Hertfordshire Dept Comp Sci Hatfield AL10 9AB Herts England

Energy efficiency has become one of the most important challenges in designing future computing systems, and the storage system is one of the largest energy consumers within them. This paper proposes an Energy Efficient Disk (EED) drive architecture which integrates a relatively small-sized NAND flash memory into a traditional disk drive to explore the impact of the flash memory on the performance and energy consumption of the disk. The EED monitors data access patterns and moves the frequently accessed data from the magnetic disk to the flash memory. Due to the data migration, most of the data accesses can be satisfied with the flash memory, which extends the idle period of the disk drive and enables the disk drive to stay in a low power state for an extended period of time. Because flash memory consumes considerably less energy and the read access is much faster than a magnetic disk, the EED can save significant amounts of energy while reducing the average response time. Real trace driven simulations are employed to validate the proposed disk drive architecture. An energy coefficient, which is the product of the average response time and the average energy consumption, is proposed as a performance metric to measure the EED. The simulation results, along with the energy coefficient, show that the EED can achieve an 89.11% energy consumption reduction and a 2.04% average response time reduction with cello99 trace, a 7.5% energy consumption reduction and a 45.15% average response time reduction with cello96 trace, and a 20.06% energy consumption reduction and a 6.02% average response time reduction with TPC-D trace, respectively. Traditionally, energy conservation and performance improvement are contradictory. The EED strikes a good balance between conserving energy and improving performance. (c) 2008 Elsevier Inc. All rights reserved.

关键词： Disk drive Energy efficiency data access pattern Architecture Non-volatile memory

来源：评论

学校读者我要写书评

暂无评论

Exploiting the performance gains of modern disk drives by enhancing data locality

引用

INFORMATION SCIENCES 2009年第14期179卷 2494-2511页

作者： Deng, Yuhui Cambridge Cranfield High Performance Comp Facil Cranfield MK43 0AL Beds England

Due to the widening performance gap between RAM and disk drives, a large number of I/O optimization methods have been proposed and designed to alleviate the impact of this gap. One of the most effective approaches of improving disk access performance is enhancing data locality. This is because the method could increase the hit ratio of disk cache and reduce the seek time and rotational latency. Disk drives have experienced dramatic development since the first disk drive was announced in 1956. This paper investigates some important characteristics of modern disk drives. Based on the characteristics and the observation that data access on disk drives is highly skewed, the frequently accessed data blocks and the correlated data blocks are clustered into objects and moved to the outer zones of a modern disk drive. The idea attempts to enhance spatial locality, improve the efficiency of aggressive sequential prefetch, and take advantage of Zoned Bit Recording (ZBR). An experimental simulation is employed to investigate the performance gains generated by the enhanced data locality. The performance gains are analyzed by breaking down the disk access time into seek time, rotational latency, data transfer time, and hit ratio of the disk cache. Experimental results provide useful insights into the performance behaviours of a modern disk drive with enhanced data locality. (C) 2009 Elsevier Inc. All rights reserved.

关键词： Disk drive data locality data access pattern Block correlation data migration Performance

来源：评论

学校读者我要写书评

暂无评论

When to Use Standards-Based APIs (Part 1)

引用

IEEE CLOUD COMPUTING 2015年第5期2卷 76-80页

作者： Sill, Alan Texas Tech Univ Ctr High Performance Comp Lubbock TX 79409 USA

This column completes a two-part exploration into features of application programming interfaces (APIs) that are useful in clouds. The discussion contrasts APIs with other types of interfaces and describes variations on protocols and calling methods, giving examples from physical hardware control to illustrate important features of cloud API design.

关键词： Application Program Interfaces Cloud Computing Standards Based API data access pattern Local User Driven Method Cloud Based Tools Application Programmer Interface API Design API Concept Cloud Computing Programming Application Programming Interfaces Context Modeling Technological Innovation Design Methodology Cloud Standards Programming data access API Design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：