检索结果-内蒙古大学图书馆

Optimizing Read Operations of hadoop distributed file system on Heterogeneous Storages

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 2021年第3期37卷 709-729页

作者： Lee, Jongbaeg Lee, Jongwuk Lee, Sang-Won Sungkyunkwan Univ Dept Elect & Comp Engn Suwon 16419 South Korea Sungkyunkwan Univ Coll Comp Suwon 16419 South Korea

The key challenge in big data processing frameworks such as hadoop distributed file system (HDFS) is to optimize the throughput for read operations. Toward this goal, several studies have been conducted to enhance read performance on heterogeneous storages. Recently, although HDFS has supported several storage policies for placing data blocks in heterogeneous storages, it fails to fully utilize the potential of fast storages (e.g., SSD). The primary reason for its suboptimal read performance is that, while distributing read requests, existing HDFS only considers the network distance between the client and datanodes, thereby incurring more read requests to slower storages with more data (e.g., HDD). In this paper, we propose a new data retrieval policy for distributing read requests on heterogeneous storages in HDFS. Specifically, the proposed policy considers both the unique characteristics of storages in datanodes and the network environments, to efficiently distribute read requests. We develop several policies including the proposed policy to balance these two factors such as random selection, storage type selection, weighted round-robin selection, and dynamic round-robin selection. Our experimental results show that the throughput of the proposed method outperforms those of the existing policies by up to six times in extensive benchmark datasets.

关键词： hadoop distributed file system heterogeneous storage data retrieval policy MapReduce load balancing

来源：评论

学校读者我要写书评

暂无评论

Performance Study on Indexing and Accessing of Small file in hadoop distributed file system

引用

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2021年第4期20卷

作者： Rodrigues, Anisha P. Fernandes, Roshan Vijaya, P. Chander, Satish NMAM Inst Technol Dept Comp Sci & Engn Nitte India Modern Coll Business & Sci Dept Math & Comp Sci Bowshar Oman Birla Inst Technol Dept Comp Sci & Engn Ranchi Bihar India

hadoop distributed file system (HDFS) is developed to efficiently store and handle the vast quantity of files in a distributed environment over a cluster of computers. Various commodity hardware forms the hadoop cluster, which is inexpensive and easily available. The large number of small files stored in HDFS consumed more memory which lags the performance because small files consumed heavy load on NameNode. Thus, the efficiency of indexing and accessing the small files on HDFS is improved by several techniques, such as archive files, New hadoop Archive (New HAR), CombinefileInputFormat (CFIF), and Sequence file generation. The archive file combines the small files into single blocks. The new HAR file combines the smaller files into a single large file. The CFIF module merges the multiple files into a single split using NameNode, and the sequence file combines all the small files into a single sequence. The indexing and accessing of a small file in HDFS are evaluated using performance metrics, such as processing time and memory usage. The experiment shows that the sequence file generation approach is efficient when compared to other approaches concerning file access time is 1.5s, memory usage is 20 KB in multi-node, and the processing time is 0.1s.

关键词： hadoop distributed file system MapReduce hadoop Archive combinefileinputformat sequence file

来源：评论

学校读者我要写书评

暂无评论

An Efficient Block Assignment Policy in hadoop distributed file system for Multimedia Data Processing

引用

IEICE TRANSACTIONS ON INFORMATION AND systemS 2019年第8期E102D卷 1569-1571页

作者： Kim, Cheolgi Lee, Daechul Lee, Jaehyun Lee, Jaehwan Korea Aerosp Univ Goyang City South Korea

hadoop, a distributed processing framework for big-data, is now widely used for multimedia processing. However, when processing video data from a hadoop distributed file system (HDFS), unnecessary network traffic is generated due to an inefficient HDFS block slice policy for picture frames in video files. We propose a new block replication policy to solve this problem and compare the newly proposed HDFS with the original HDFS via extensive experiments. The proposed HDFS reduces network traffic, and increases locality between processing cores and file locations.

关键词： hadoop hadoop distributed file system video processing group of pictures

来源：评论

学校读者我要写书评

暂无评论

Investigation of Replication Factor for Performance Enhancement in the hadoop distributed file system 18

Investigation of Replication Factor for Performance Enhancem...

引用

9th ACM/SPEC International Conference on Performance Engineering (ICPE)

作者： Ciritoglu, Hilmi Egemen de Almeida, Leandro Batista de Almeida, Eduardo Cunha Buda, Teodora Sandra Murphy, John Thorpe, Christina Univ Coll Dublin Sch Comp Sci Dublin Ireland Univ Fed Parana Curitiba Parana Brazil IBM Ireland Cognit Comp Grp Innovat Exchange Dublin Ireland Univ Tecnol Fed Parana Curitiba Parana Brazil

ISBN: (纸本)9781450356299

The massive growth in the volume of data and the demand for big data utilisation has led to an increasing prevalence of hadoop distributed file system (HDFS) solutions. However, the performance of hadoop and indeed HDFS has some limitations and remains an open problem in the research community. The ultimate goal of our research is to develop an adaptive replication system;this paper presents the first phase of the work - an investigation into the replication factor used in HDFS to determine whether increasing the replication factor for in-demand data can improve the performance of the system. We constructed a physical hadoop cluster for our experimental environment, using TestDFSIO and both the real world and the synthetic data sets, NOAA and TPC-H, with Hive to validate our proposal. Results show that increasing the replication factor of the 'hot' data increases the availability and locality of the data, and thus, decreases the job execution time.

关键词： hadoop distributed file system Performance Testing Replication Factor

来源：评论

学校读者我要写书评

暂无评论

Towards a Better Replica Management for hadoop distributed file system 7

Towards a Better Replica Management for Hadoop Distributed F...

引用

IEEE International Congress on Big Data (IEEE BigData) Part of the IEEE World Congress on Services

作者： Ciritoglu, Hilmi Egemen Saber, Takfarinas Buda, Teodora Sandra Murphy, John Thorpe, Christina Univ Coll Dublin Sch Comp Sci Performance Engn Lab Dublin Ireland Univ Coll Dublin Sch Business Nat Comp Res & Applicat Grp Dublin Ireland IBM Ireland Innovat Exchange Cognit Comp Grp Dublin Ireland

ISBN: (纸本)9781538672327

The hadoop distributed file system (HDFS) is the storage of choice when it comes to large-scale distributed systems. In addition to being efficient and scalable, HDFS provides high throughput and reliability through the replication of data. Recent work exploits this replication feature by dynamically varying the replication factor of in-demand data as a means of increasing data locality and achieving a performance improvement. However, to the best of our knowledge, no study has been performed on the consequences of varying the replication factor. In particular, our work is the first to show that although HDFS deals well with increasing the replication factor, it experiences problems with decreasing it. This leads to unbalanced data, hot spots, and performance degradation. In order to address this problem, we propose a new workload-aware balanced replica deletion algorithm. We also show that our algorithm successfully maintains the data balance and achieves up to 48% improvement in execution time when compared to HDFS, while only creating an overhead of 1.69% on average.

关键词： hadoop distributed file system Replication Factor Software Performance

来源：评论

学校读者我要写书评

暂无评论

LOAD REBALANCING FOR hadoop distributed file system USING distributed HASH TABLE

LOAD REBALANCING FOR HADOOP DISTRIBUTED FILE SYSTEM USING DI...

引用

International Conference on Intelligent Sustainable systems (ICISS)

作者： Nithya, M. Maheshwari, N. Uma Psna Coll Engn & Technol Dept Comp Sci & Engn Dindigul India

ISBN: (纸本)9781538619599

All the machines are required to be under a common administrator and be able to communicate securely. To communicate securely, Advanced Encryption Standard (AES) algorithm is used to enable protection of data at each clusters. It performs encryption and decryption before read and write respectively. Encryption and Decryption in key is used for securing hadoop distributed file system. The existing system depends on a single name node to manage almost all operations of every data block in the file system. As a result it can be a bottleneck resource and a single point of failure. To overcome this failure, Load Rebalancing Algorithm is used for hadoop distributed file system using distributed Hash Table. The Proposed load rebalancing algorithm will be compared against a centralized approach in a production system and a competing distributed solution is presented in the literature. The storage nodes are structured as a network based on distributed hash table discovering a file chunk can simply refer to rapid key lookup in DHTs, given that a unique handle (or identifier) is assigned to each file chunk.

关键词： hadoop distributed file system distributed Hash Table Load Rebalancing Algorithm

来源：评论

学校读者我要写书评

暂无评论

A New Replica Placement Policy for hadoop distributed file system 2

A New Replica Placement Policy for Hadoop Distributed File S...

引用

2nd IEEE International Conference on High Performance and Smart Computing (IEEE HPSC)

作者： Dai, Wei Ibrahim, Ibrahim Bassiouni, Mostafa Univ Cent Florida Dept Elect & Comp Engn Orlando FL 32816 USA Univ Cent Florida Dept Comp Sci Orlando FL 32816 USA

ISBN: (纸本)9781509024032

Today, hadoop distributed file system (HDFS) is widely used to provide scalable and fault-tolerant storage of large volumes of data. One of the key issues that affect the performance of HDFS is the placement of data replicas. Although the current HDFS replica placement policy can achieve both fault tolerance and read/write efficiency, the policy cannot evenly distribute replicas across cluster nodes, and has to rely on load balancing utility to balance replica distributions. In this paper, we present a new replica placement policy for HDFS, which can generate replica distributions that are not only perfectly even but also meet all HDFS replica placement requirements.

关键词： hadoop distributed file system Replica Placement Data Replication Load Balance hadoop MapReduce Cloud Computing

来源：评论

学校读者我要写书评

暂无评论

A Load-Balancing Algorithm for hadoop distributed file system 18

A Load-Balancing Algorithm for Hadoop Distributed File Syste...

引用

18th International Conference on Network-Based Information systems (NBiS)

作者： Lin, Chi-Yi Lin, Ying-Chen Tamkang Univ Dept Comp Sci & Informat Engn Taipei Taiwan

ISBN: (纸本)9781479999422

hadoop distributed file system (HDFS) is developed to store a huge volume of data. files are divided into blocks and the replicated blocks are then stored on many DataNodes in a distributed manner. Although doing so makes HDFS fault tolerant, the random nature of the default block placement strategy may lead to load imbalance among the DataNodes. Moreover, the built-in load-balancing algorithm Balancer may reduce the performance and consume lots of network resources. Therefore in this paper we consider all the situations that may influence the load-balancing state and propose a new load-balancing algorithm. In the proposed algorithm a new role named BalanceNode is introduced to help in matching heavy loaded and light-loaded DataNodes, so those light-loaded nodes can share part of the load from heavy-loaded ones. The simulation results show that our algorithm can achieve a good load-balancing state in the HDFS compared with two existing algorithms.

关键词： cloud computing hadoop distributed file system load balancing

来源：评论

学校读者我要写书评

暂无评论

EDAS:Efficient Data Access Scheme of Data Replication for hadoop distributed file system(HDFS)

EDAS:Efficient Data Access Scheme of Data Replication for Ha...

引用

2015 International Conference on Future Computational Technologies (ICFCT’2015);International Conference on Advances in Chemical, Biological & Environmental Engineering (ACBEE);International Conference on Urban Planning, Transport and Construction Engineering (ICUPTCE’15)

作者： Hnin Htet Htet Aung Nyein Nyein Oo Department of Information Technology Engineering Yangon Technological University

Cloud computing is composed of a large number of distributed computation and storage resources to facilitate the management of distributed and sharing data resources *** is a great challenge to ensure efficient access of data replication to such huge and widely distributed data in cloud *** address this need,we proposed an Efficient Data Access Scheme(EDAS) of data replication for hadoop distributed file system(HDFS) to adaptively select the replica of data file form among service *** is an open source cloud based storage platform and deigned to be deployed in low-cost commodity *** HDFS,data are distributed and replicated in cluster of commodity *** supports the access nodes decision of replica data for the users to get quick access form the adaptive services nodes according to the load of *** to provide the high performance of replication access and achieve load balance of service nodes,the proposed EDAS Algorithm implements based on historical data access record form the metadata of HDFS and anti-blocking probability selection method.

关键词： cloud computing hadoop distributed file system blocking probability replication access load balance

来源：评论

学校读者我要写书评

暂无评论

Dynamic core affinity for high-performance file upload on hadoop distributed file system

引用

PARALLEL COMPUTING 2014年第10期40卷 722-737页

作者： Cho, Joong-Yeon Jin, Hyun-Wook Lee, Min Schwan, Karsten Konkuk Univ Dept Comp Sci & Engn Seoul South Korea Georgia Inst Technol Ctr Expt Res Comp Syst Atlanta GA 30332 USA

The MapReduce programming model, in which the data nodes perform both the data storing and the computation, was introduced for big-data processing. Thus, we need to understand the different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. In particular, the provision of high-performance data storing has become more critical because of the continuously increasing volume of data uploaded to distributed file systems and database servers. However, the analysis of the performance characteristics of the processes that store upstream data is very intricate, because both network and disk inputs/outputs (I/O) are heavily involved in their operations. In this paper, we analyze the impact of core affinity on both network and disk I/O performance and propose a novel approach for dynamic core affinity for high-throughput file upload. We consider the dynamic changes in the processor load and the intensiveness of the file upload at run-time, and accordingly decide the core affinity for service threads, with the objective of maximizing the parallelism, data locality, and resource efficiency. We apply the dynamic core affinity to hadoop distributed file system (HDFS). Measurement results show that our implementation can improve the file upload throughput of end applications by more than 30% as compared with the default HDFS, and provide better scalability. (C) 2014 Elsevier B.V. All rights reserved.

关键词： Affinity Big-data Multi-core hadoop distributed file system Process scheduling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：