Cluster computing has become an inevitable part of data processing as the huge volume of data being produced from different sources like online social media, IoT, mobiledata, sensor data, black box data and so on incr...
详细信息
Cluster computing has become an inevitable part of data processing as the huge volume of data being produced from different sources like online social media, IoT, mobiledata, sensor data, black box data and so on increases in an exponentially fast manner. Distributed filesystem defines different methods to distribute, read and eliminate the files among different cluster computing nodes. It is found that popular distributed filesystems such as google file system and Hadoop Distributed filesystem store metadata centrally. This creates a chance for a Single Point of Failure that arises the need for backup and alternative solutions to recover the metadata on the failure of the metadata server. Also, the name node server is built using expensive and reliable hardware. For small and medium clusters, it is not cost effective to maintain expensive name node server. Even though cheap commodity hardware may substitute the name node functionality, they are prone to hardware failure. This paper proposes a novel distributed filesystem to distribute files over a cluster of machines connected in a Peer-to-Peer network. The most significant feature of the filesystem is its capability to distribute the metadata using distributed consensus, using hash values. Although the distributed metadata is visible to the public, the methodology ensures that it is immutable and irrefutable. As part of the in-depth research, the proposed filesystem has been successfully tested in the google Cloud Platform. Also, the basic operations like read, write, and delete on Distributed filesystem with distributed metadata are compared with that of Hadoop Distributed filesystem based on distribution time on the same cluster setup. The novel distributed filesystem provides better results compared to the existing methodologies.
Big Data has come up with aureate haste and a clef enabler for the social business;Big Data gifts an opportunity to create extraordinary business advantage and better service delivery. Big Data is bringing a positive ...
详细信息
ISBN:
(纸本)9781479930708
Big Data has come up with aureate haste and a clef enabler for the social business;Big Data gifts an opportunity to create extraordinary business advantage and better service delivery. Big Data is bringing a positive change in the decision making process of various business organizations. With the several offerings Big Data has come up with several issues and challenges which are related to the Big Data Management, Big Data processing and Big Data analysis. Big Data is having challenges related to volume, velocity and variety. Big Data has 3Vs Volume means large amount of data, Velocity means data arrives at high speed, Variety means data comes from heterogeneous resources. In Big Data definition, Big means a dataset which makes data concept to grow so much that it becomes difficult to manage it by using existing data management concepts and tools. Map Reduce is playing a very significant role in processing of Big Data. This paper includes a brief about Big Data and its related issues, emphasizes on role of MapReduce in Big Data processing. MapReduce is elastic scalable, efficient and fault tolerant for analysing a large set of data, highlights the features of MapReduce in comparison of other design model which makes it popular tool for processing large scale data. Analysis of performance factors of MapReduce shows that elimination of their inverse effect by optimization improves the performance of Map Reduce.
Cloud computing is an emerging, computing model wherein the tasks are allocated to software, combination of connections, and services accessed over a network. This connections and network of servers is collectively kn...
详细信息
ISBN:
(纸本)9781479949892
Cloud computing is an emerging, computing model wherein the tasks are allocated to software, combination of connections, and services accessed over a network. This connections and network of servers is collectively known as the cloud. In place of operating their own data centers, users might rent computing power and storage capacity from a service provider and pays only for what they use. Cloud storage is delivering the data storage as service. If the data is stored in cloud, it must provide the data access and heterogeneity. With the advances in cloud computing it allows storing of large number of images and data throughout the world. This paper proposes the indexing and metadata management which helps to access the distributed data with reduced latency. The metadata management can be enhanced for large scale filesystem applications. When designing the metadata, the storage location of the metadata and attributes is important for the efficient retrieval of the data. Indexes are used to quickly locate data without having to search over every location in storage. Based on these two models, the data can be easily fetched and the search time was reduced to retrieve the appropriate data.
Everything around us is being created in this day and age. Data generation rates are so frightening that there is a need for simple and cost-effective data storage, as well as a push to implement recovery processes. F...
详细信息
ISBN:
(纸本)9781450399937
Everything around us is being created in this day and age. Data generation rates are so frightening that there is a need for simple and cost-effective data storage, as well as a push to implement recovery processes. Furthermore, the relationship between understanding and assets must be examined in large datasets, which can lead to good decision-making and business strategies. The goal of this paper is to create an algorithm for the map reduce application, which includes a ballet count application that describes how to manage large amounts of data using various information mappers and how to distribute it using the map () function. Mapper outputs and retenders have been added, as well as map () to simplify the function. Set the execution input key/value pair as well as the output key/value pair. Maps receives user input and continuously implements and collects key/value pairs. The map passes through the libraries group with all of the intermediate quality and low function. Low Function assigns a value to the intermediate key and value pair key. It adds all associated values together to form a smaller value. As a result, when we perform the pre-process that is responsible for taking the internet key and its associated value, Reducer generates only one value pair or zero value pair.
暂无评论