检索结果-内蒙古大学图书馆

Communication optimisation for intermediate data of mapreduce computing model

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING 2020年第2期21卷 226-233页

作者： Cao, Yunpeng Wang, Haifeng Linyi Univ Coll Informat Sci & Engn Middle Shuangling Rd Linyi 276000 Shandong Peoples R China Linda Inst Shandong Prov Key Lab Network Based Intelligent C Middle Shuangling Rd Linyi 276000 Shandong Peoples R China

mapreduce is a typical computing model for processing and analysis of big data. mapreduce computing job produces a large amount of intermediate data after map phase. Massive intermediate data results in a large amount of intermediate data communication across rack switches in the Shuffle process of mapreduce computing model, this degrades the performance of heterogeneous cluster computing. In order to optimise the intermediate data communication performance of map-intensive jobs, the characteristics of pre-running scheduling information of mapreduce computing jobs are extracted, and job classification is realised by machine learning. The jobs of active intermediate data communication are mapped into a rack to keep the communication locality of intermediate data. The jobs with inactive communication are deployed to the nodes sorted by computing performance. The experimental results show that the proposed communication optimisation scheme has a good effect on Shuffle-intensive jobs, and can reach 4%-5%. In the case of larger amount of input data, the communication optimisation scheme is robust and can adapt to heterogeneous cluster. In the case of multi-user application scene, the intermediate data communication can be reduced by 4.1%.

关键词： mapreduce computing model big data processing communication optimisation intermediate data machine learning

来源：评论

学校读者我要写书评

暂无评论

Survey of Distributed computing Frameworks for Supporting Big Data Analysis

引用

Big Data Mining and Analytics 2023年第2期6卷 154-169页

作者： Xudong Sun Yulin He Dingming Wu Joshua Zhexue Huang College of Computer Science and Software Engineering Shenzhen UniversityShenzhen 518060China Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ) Shenzhen 518107China

Distributed computing frameworks are the fundamental component of distributed computing *** provide an essential way to support the efficient processing of big data on clusters or *** size of big data increases at a pace that is faster than the increase in the big data processing capacity of ***,distributed computing frameworks based on the mapreduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in *** performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the mapreduce programming *** distributed computing frameworks need to be developed to conquer these *** this paper,we review mapreduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data *** addition,we present a non-mapreduce distributed computing framework that has the potential to overcome big data analysis challenges.

关键词： distributed computing frameworks big data analysis approximate computing mapreduce computing model

来源：评论

学校读者我要写书评

暂无评论

Construction of large scale English teaching corpus and e-learning system based on Apache Hadoop algorithm

引用

ENTERTAINMENT computing 2024年 50卷

作者： Yu, Junling Univ Shanghai Sci & Technol Coll Foreign Languages Shanghai 200093 Peoples R China

Currently, the contradiction between traditional English teaching and the need for dynamic learning is becoming increasingly prominent. Therefore, the organic integration of new teaching aids with traditional English teaching is conducive to achieving a new concept design and innovation in English teaching. On this basis, a large-scale English teaching corpus was established by combining mapreduce with Apache Hadoop algorithm. At the same time, information theory and methods based on the mapreduce computing model are proposed and applied to the parallelism analysis of big data, effectively solving practical problems that are difficult to expand certain datasets. Secondly, this article will design a new algorithm, Salsa20, based on the construction characteristics of Apache Hadoop. By utilizing the parallel characteristics of this algorithm, we can parallelize the Apache Hadoop algorithm, improve the system's data processing ability, ensure the security of data in mapreduce operation mode, and ensure its efficient data processing ability. In addition, the comprehensive and systematic use of largescale corpora to establish an English teaching system based on them can effectively improve some of the problems that arise in English teaching, greatly enhance the teaching effectiveness of English courses.

关键词： mapreduce computing model Apache Hadoop algorithm English teaching Corpus

来源：评论

学校读者我要写书评

暂无评论

A Hadoop Processing Method for Massive Sensor Network Data Based on Internet of Things

引用

INTERNATIONAL JOURNAL OF WIRELESS INFORMATION NETWORKS 2020年第2期27卷 299-306页

作者： Zhang, Yanxin Hunan Womens Univ Dept Informat Technol Changsha 410004 Hunan Peoples R China

Based on the analysis of the architecture of the Internet of Things service platform and the key technologies of cloud computing, a massive sensing information processing scheme based on the Internet of Things service platform is proposed. The scheme first proposes a system architecture model that can satisfy the massive sensor information processing in an open platform environment, and designs multiple functional unit modules of the system. By combining these functional units, service configurability can be realized, facing thousands of services and Tenant. Then, Hadoop open source framework is used to realize the distributed computing of the system, which makes full use of the processing advantages of mapreduce computing model, HBase distributed database and HDFS distributed file system in Hadoop framework, and uses Oracle database as a supplement to realize the system high. Finally, the mass sensor information was deployed and tested. The effectiveness of the Hadoop processing method was verified by analyzing the results of mapreduce parallel computing experiments. The average cache hit rate is 93.1%, which has a high cache hit rate, greatly reduces MySQL database I/O, and improves system performance.

关键词： Hadoop processing method Internet of Things (IoT) Massive data Sensor network Distributed file system mapreduce computing model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：