检索结果-内蒙古大学图书馆

Application and user-specific data prefetching and parallel read algorithms for distributed file systems

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2024年第3期27卷 3593-3613页

作者： Nalajala, Anusha Ragunathan, T. Naha, Ranesh Battula, Sudheer Kumar SRM Univ AP Dept CSE Neerukonda 522502 Andhra Pradesh India Sri Ramachandra Inst Higher Educ & Res Fac Engn & Technol Chennai Tamil Nadu India Federat Univ Australia Ctr Smart Analyt Gippsland Campus Churchill Vic 3841 Australia Federat Univ Australia Ctr Smart Analyt Gippsland Campus Churchill Vic 3841 Australia

Cloud computing systems are widely used to deploy big data-based applications because of their high storage and computation capacity. The key component for storage in cloud computing environment is distributed file system which can store and process data produced by big data-based applications effectively. The users of such big data-based applications issue read requests more frequently when compared to write requests. So, most of these cloud-based applications demand optimal performance from the distributed file system, especially for read operations. Numerous caching and prefetching techniques have been proposed in the existing literature to enhance the performance of distributed file system. However, these techniques typically adopt a synchronous approach, focusing on either application data prefetching or user data prefetching, when the user application starts executing and this may result in an extended read access time. Furthermore, the data is prefetched either based on access frequency or reuse distance with out considering the access recency of data which may result in less cache hit ratio. In this paper, we have proposed application-specific and user-specific data prefetching algorithms for prefetching the data from the distributed file system and storing the same in the multi-level caches present in the distributed file system based on the combination of access frequency and recency ranking of file blocks that were previously accessed by client application programs. Additionally, we have divided the cache into two partitions namely user and application caches to store the prefetched data as per the popularity value calculated by considering user and application level accesses. We have also introduced a parallel read algorithm to read data simultaneously from the multiple caches present in the distributed file system environment. The simulation results demonstrate that, the proposed algorithms improved the distributed file systems performance by minimum of

关键词： Big data applications Cloud computing Distributed file systems prefetching techniques Multi-level caches Cache replacement

来源：评论

学校读者我要写书评

暂无评论

Tree-based scheme for reducing shared cache miss rate leveraging regional, statistical and temporal similarities

引用

IET COMPUTERS AND DIGITAL techniques 2014年第1期8卷 30-48页

作者： Lenjani, Marzieh Hashemi, Mahmoud Reza Univ Tehran Coll Engn Multimedia Proc Lab Sch Elect & Comp Engn Tehran 14174 Iran

Cache miss can have a major impact on overall performance of many-core systems. A miss may result in extra traffic and delay because of coherency messages. This has been reduced in coarse-grain coherency protocols where only shared misses require a coherency message. Conventional off-chip methods manage the shared miss rate by relying on reuse histories. However the pertinent memory overhead that comes with reuse histories makes them impractical for on-chip multi-processor systems. In this study, a new scheme has been proposed to reduce shared cache miss rate in multi-processor system-on-chips that benefits from novel prefetching techniques to L2 caches from off-chip memories or other remote L2 caches located on-chip. In the proposed scheme, the previously proposed Virtual Tree Coherence (VTC) method has been extended to limit block forwarding messages to true sharers within each region. Instead of relying on exact reuse histories, shared regions are searched for regional, temporal and statistical similarities. These similarities are exploited for determining the sharers that should receive the forwarded blocks. The proposed method has been evaluated with Splash-2 workloads. Simulation results indicate that the proposed method has reduced shared miss count by up to 75%, and improved interconnect traffic by up to 47% compared with VTC.

关键词： cache storage multiprocessing systems network-on-chip statistical analysis tree-based scheme shared cache miss rate reduction regional similarities statistical similarities temporal similarities many-core systems coherency messages coarse-grain coherency protocols off-chip methods reuse histories reader set detection address sequence recurring on-chip multiprocessor systems multiprocessor system-on-chips prefetching techniques off-chip memories remote L2 caches virtual trees NoC network-on-chip Splash-2 workloads

来源：评论

学校读者我要写书评

暂无评论

A Case for Resource Efficient prefetching in Multicores 43

A Case for Resource Efficient Prefetching in Multicores

引用

43rd Annual International Conference on Parallel Processing (ICPP)

作者： Khan, Muneeb Sandberg, Andreas Hagersten, Erik Uppsala Univ Dept Informat Technol Uppsala Sweden

ISBN: (纸本)9781479956180

Modern processors typically employ sophisticated prefetching techniques for hiding memory latency. Hardware prefetching has proven very effective and can speed up some SPEC CPU 2006 benchmarks by more than 40% when running in isolation. However, this speedup often comes at the cost of prefetching a significant volume of useless data (sometimes more than twice the data required) which wastes shared last level cache space and off-chip bandwidth. This paper explores how an accurate resource-efficient prefetching scheme can benefit performance by conserving shared resources in multicores. We present a framework that uses low-overhead runtime sampling and fast cache modeling to accurately identify memory instructions that frequently miss in the cache. We then use this information to automatically insert software prefetches in the application. Our prefetching scheme has good accuracy and employs cache bypassing whenever possible. These properties help reduce off-chip bandwidth consumption and last-level cache pollution. While single-thread performance remains comparable to hardware prefetching, the full advantage of the scheme is realized when several cores are used and demand for shared resources grows. We evaluate our method on two modern commodity multicores. Across 180 mixed workloads that fully utilize a multicore, the proposed software prefetching mechanism achieves up to 24% better throughput than hardware prefetching, and performs 10% better on average.

关键词： cache storage sampling methods shared memory systems storage management SPEC CPU 2006 benchmarks cache modeling cache space commodity multicore hardware prefetching last-level cache pollution low-overhead runtime sampling memory latency multicores off-chip bandwidth consumption prefetching techniques resource efficient prefetching resource-efficient prefetching scheme shared resources single-thread performance Analytical models Benchmark testing Hardware Load modeling Multicore processing prefetching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：