作者:
Cao, YunpengWang, HaifengLinyi Univ
Coll Informat Sci & Engn Middle Shuangling Rd Linyi 276000 Shandong Peoples R China Linda Inst
Shandong Prov Key Lab Network Based Intelligent C Middle Shuangling Rd Linyi 276000 Shandong Peoples R China
mapreduce is a typical computingmodel for processing and analysis of big data. mapreducecomputing job produces a large amount of intermediate data after map phase. Massive intermediate data results in a large amount...
详细信息
mapreduce is a typical computingmodel for processing and analysis of big data. mapreducecomputing job produces a large amount of intermediate data after map phase. Massive intermediate data results in a large amount of intermediate data communication across rack switches in the Shuffle process of mapreduce computing model, this degrades the performance of heterogeneous cluster computing. In order to optimise the intermediate data communication performance of map-intensive jobs, the characteristics of pre-running scheduling information of mapreducecomputing jobs are extracted, and job classification is realised by machine learning. The jobs of active intermediate data communication are mapped into a rack to keep the communication locality of intermediate data. The jobs with inactive communication are deployed to the nodes sorted by computing performance. The experimental results show that the proposed communication optimisation scheme has a good effect on Shuffle-intensive jobs, and can reach 4%-5%. In the case of larger amount of input data, the communication optimisation scheme is robust and can adapt to heterogeneous cluster. In the case of multi-user application scene, the intermediate data communication can be reduced by 4.1%.
Distributed computing frameworks are the fundamental component of distributed computing *** provide an essential way to support the efficient processing of big data on clusters or *** size of big data increases at a p...
详细信息
Distributed computing frameworks are the fundamental component of distributed computing *** provide an essential way to support the efficient processing of big data on clusters or *** size of big data increases at a pace that is faster than the increase in the big data processing capacity of ***,distributed computing frameworks based on the mapreduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in *** performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the mapreduce programming *** distributed computing frameworks need to be developed to conquer these *** this paper,we review mapreduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data *** addition,we present a non-mapreduce distributed computing framework that has the potential to overcome big data analysis challenges.
Currently, the contradiction between traditional English teaching and the need for dynamic learning is becoming increasingly prominent. Therefore, the organic integration of new teaching aids with traditional English ...
详细信息
Currently, the contradiction between traditional English teaching and the need for dynamic learning is becoming increasingly prominent. Therefore, the organic integration of new teaching aids with traditional English teaching is conducive to achieving a new concept design and innovation in English teaching. On this basis, a large-scale English teaching corpus was established by combining mapreduce with Apache Hadoop algorithm. At the same time, information theory and methods based on the mapreduce computing model are proposed and applied to the parallelism analysis of big data, effectively solving practical problems that are difficult to expand certain datasets. Secondly, this article will design a new algorithm, Salsa20, based on the construction characteristics of Apache Hadoop. By utilizing the parallel characteristics of this algorithm, we can parallelize the Apache Hadoop algorithm, improve the system's data processing ability, ensure the security of data in mapreduce operation mode, and ensure its efficient data processing ability. In addition, the comprehensive and systematic use of largescale corpora to establish an English teaching system based on them can effectively improve some of the problems that arise in English teaching, greatly enhance the teaching effectiveness of English courses.
Based on the analysis of the architecture of the Internet of Things service platform and the key technologies of cloud computing, a massive sensing information processing scheme based on the Internet of Things service...
详细信息
Based on the analysis of the architecture of the Internet of Things service platform and the key technologies of cloud computing, a massive sensing information processing scheme based on the Internet of Things service platform is proposed. The scheme first proposes a system architecture model that can satisfy the massive sensor information processing in an open platform environment, and designs multiple functional unit modules of the system. By combining these functional units, service configurability can be realized, facing thousands of services and Tenant. Then, Hadoop open source framework is used to realize the distributed computing of the system, which makes full use of the processing advantages of mapreduce computing model, HBase distributed database and HDFS distributed file system in Hadoop framework, and uses Oracle database as a supplement to realize the system high. Finally, the mass sensor information was deployed and tested. The effectiveness of the Hadoop processing method was verified by analyzing the results of mapreduce parallel computing experiments. The average cache hit rate is 93.1%, which has a high cache hit rate, greatly reduces MySQL database I/O, and improves system performance.
暂无评论