检索结果-内蒙古大学图书馆

iMapReduce: A distributed computing framework for Iterative Computation

JOURNAL OF GRID computing 2012年第1期10卷 47-68页

作者： Zhang, Yanfeng Gao, Qixin Gao, Lixin Wang, Cuirong Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China NE Univ Qinhuangdao Dept Elect & Informat Engn Qinhuangdao 066000 Hebei Peoples R China Univ Massachusetts Amherst Dept Elect & Comp Engn Amherst MA 01002 USA

Iterative computation is pervasive in many applications such as data mining, web ranking, graph analysis, online social network analysis, and so on. These iterative applications typically involve massive data sets containing millions or billions of data records. This poses demand of distributed computing frameworks for processing massive data sets on a cluster of machines. MapReduce is an example of such a framework. However, MapReduce lacks built-in support for iterative process that requires to parse data sets iteratively. Besides specifying MapReduce jobs, users have to write a driver program that submits a series of jobs and performs convergence testing at the client. This paper presents iMapReduce, a distributed framework that supports iterative processing. iMapReduce allows users to specify the iterative computation with the separated map and reduce functions, and provides the support of automatic iterative processing within a single job. More importantly, iMapReduce significantly improves the performance of iterative implementations by (1) reducing the overhead of creating new MapReduce jobs repeatedly, (2) eliminating the shuffling of static data, and (3) allowing asynchronous execution of map tasks. We implement an iMapReduce prototype based on Apache Hadoop, and show that iMapReduce can achieve up to 5 times speedup over Hadoop for implementing iterative algorithms.

关键词： Iterative computation iMapReduce distributed computing framework Hadoop

来源：评论

学校读者我要写书评

暂无评论

Exploring Dynamic Task Loading in SGX-Based distributed computing

引用

IEEE TRANSACTIONS ON SERVICES computing 2023年第1期16卷 288-301页

作者： Wu, Pengfei Ning, Jianting Luo, Wu Huang, Xinyi He, Debiao Natl Univ Singapore Sch Comp Singapore 119077 Singapore Fujian Normal Univ Coll Comp & Cyber Secur Fujian Prov Key Lab Network Secur & Cryptol Fuzhou 350117 Peoples R China Chinese Acad Sci Inst Informat Engn State Key Lab Informat Secur Beijing 100093 Peoples R China Peking Univ Sch Elect Engn & Comp Sci Beijing 100871 Peoples R China Wuhan Univ Sch Cyber Sci & Engn Key Lab Aerosp Informat Secur & Trusted Comp Wuhan 430072 Peoples R China

Nowadays, data privacy is one of the most critical concerns in cloud computing, and many privacy-preserving distributed computing systems based on the trusted execution environment (e.g., Intel SGX) have been proposed to protect the user's privacy during cloud-outsourced computation. However, these SGX-based solutions are vulnerable to some traffic analyses, and loading all tasks into the enclave introduces much overhead for frequent EPC-paging. In this article, we propose a T-SGX framework, which keeps the confidentiality of a distributed job and guarantees the system efficiency by allowing dynamically loading an enclave shared object for the task under processing. In T-SGX, all these objects are secretly shared and stored in a verifiably distributed share management system (SMS) outside the TCB. To mitigate the exposure of sensitive information, we present an efficient oblivious transfer (OT) protocol under the Decisional Diffie-Hellman (DDH) assumption for obliviously transmitting desired shares. Detailed security analysis demonstrates that the proposed T-SGX achieves the goal of secure distributed computing without privacy leakage to unauthorized parties. Finally, we benchmark the framework in six real-world applications, and the experimental results show that T-SGX significantly outperforms a state-of-the-art solution, with 11.9%-29.7% less overhead performing an SGX-based application.

关键词： Cloud computing Protocols Codes Task analysis Servers Security Privacy distributed computing framework secret sharing Intel SGX privacy-preserving

来源：评论

学校读者我要写书评

暂无评论

An integrated GIS platform architecture for spatiotemporal big data

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2019年 94卷 160-172页

作者： Wang, Shaohua Zhong, Yang Wang, Erqi Chinese Acad Sci Inst Geog Sci & Nat Resources Res Beijing Peoples R China Univ Calif Santa Barbara Dept Geog Santa Barbara CA 93106 USA Claremont Grad Univ Claremont CA USA SuperMap Software Co Ltd Beijing 100015 Peoples R China Beijing Engn Technol Res Ctr Geog Informat Core S Beijing 100015 Peoples R China Natl Adm Surveying Mapping & Geoinformat Beijing Peoples R China

With the increase in smart devices, spatiotemporal data has grown exponentially. To deal with challenges caused by an increase data requires a scalable and efficient architecture that can store, query, analyze, and visualize spatiotemporal big data. This paper describes a Cloud-terminal integrated GIS platform architecture designed to meet the requirements of processing and analyzing spatiotemporal big data. Cloud terminal Integration GIS is developed according to the architecture. Extensive experiments deployed on the internal organization cluster using real-time datasets showed that the SuperMap GIS spatiotemporal big data engine achieved excellent performance. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Spatiotemporal big data distributed computing framework Cloud-terminal integration GIS SuperMap GIS

来源：评论

学校读者我要写书评

暂无评论

Assessing Big Data SQL frameworks for Analyzing Event Logs 24

Assessing Big Data SQL Frameworks for Analyzing Event Logs

引用

24th Euromicro International Conference on Parallel, distributed, and Network-Based Processing (PDP)

作者： Hinkka, Markku Lehto, Teemu Heljanko, Keijo Aalto Univ Sch Sci Dept Comp Sci Aalto Finland QPR Software Plc Helsinki Finland Aalto Univ Helsinki Finland

ISBN: (纸本)9781467387767

Performing Process Mining by analyzing event logs generated by various systems is a very computation and I/O intensive task. distributed computing and Big Data processing frameworks make it possible to distribute all kinds of computation tasks to multiple computers instead of performing the whole task in a single computer. This paper assesses whether contemporary structured query language (SQL) supporting Big Data processing frameworks are mature enough to be efficiently used to distribute computation of two central Process Mining tasks to two dissimilar clusters of computers providing BPM as a service in the cloud. Tests are performed by using a novel automatic testing framework detailed in this paper and its supporting materials. As a result, an assessment is made on how well selected Big Data processing frameworks manage to process and to parallelize the analysis work required by Process Mining tasks.

关键词： automatic business process discovery distributed computing framework distributed SQL event log analysis Hadoop Hive Presto process mining Spark

来源：评论

学校读者我要写书评

暂无评论

Key Technniques of distributed Geospatial Information Operations

Key Technniques of Distributed Geospatial Information Operat...

引用

18th International Conference on Geoinformatics

作者： Wu, Liang Chen, Zhanlong Ma, Lina Wan, Lin China Univ Geosci Fac Informat Engn Wuhan 430074 Peoples R China

ISBN: (纸本)9781424473021

In order to improve spatial operations efficiency of massive data in distributed environment and to solve the interactive design problems of spatial analysis processing module designed to service agreement with the underlying database, spatial data models, map display and so on.. For the status quo that there is no GIS software for a practical analysis of distributed computing, we have carried out in-depth study combined with the distributed characteristics of spatial data and information. The distributed geospatial information operation framework was designed in this paper. The basic characteristics of distributed computing are analyzed in this paper. The author of this paper discussed the distributed computing spatial information technology system form following aspects: apace computing task decomposition, distributed spatial data classification method, sharing data replication strategy, the data partitioning strategy based on the load and the caching mechanism of space computing framework, based on this framework, the author has developed the system for resolving the practical problems. In this paper, the proposed distributed computing framework suitable for distributed spatial analysis has solved the key technical problems of distributed spatial analysis computing framework. And it is accordant with "service-oriented" thinking, takes into account the heterogeneity of spatial data sources, and the distributed spatial computing among the different systems on different platforms. The dynamic load scheduling has improved the static data partitioning method, it avoids the load imbalance problem in the phase of static data partitioning. It solved the efficiency of large-scale spatial data operations in the complex distributed environment in practical applications. At last, based on the software, we do the distributed clipping computing environment test of the classic space experiments, a detailed result has given at the last of the article, it has shown that, the fram

关键词： distributed computing framework spatial data partition computing load balancing GIS

来源：评论

学校读者我要写书评

暂无评论

Design and research of MOOC teaching system based on TG-C4.5 algorithm

引用

SYSTEMS AND SOFT computing 2023年 5卷

作者： Chen, Xinxin Qiqihar Med Univ Foreign Language Dept Qiqihar 161000 Peoples R China

The emergence of Internet information technology has led to the development of MOOC-based online teaching methods. The study uses the traditional C4.5 algorithm for data mining to improve teaching quality and simplifies and quantifies it with the Taylor series and GINI index. The study also considers the uncertainty of data changes and the characteristics of MOOC teaching to design a parallel processing system of the HD-TG-C4.5 algorithm under the framework of the Hadoop platform. The experimental results show that the minimum data classification error of the algorithm is 2%, and the maximum recommendation accuracy of teaching resources is 92.6%. Moreover, the response time and resource search time of this algorithm system are significantly better than traditional algorithms in terms of system debugging. The average login response time is less than 0.87 s, and the success rate of system debugging reaches 90%. The probability value of students mastering teaching resource knowledge points is also above 0.7. The MOOC teaching system based on TG-C4.5 algorithm can effectively mine learner behavior data and reduce the complexity and consumption of C4.5 algorithm. The MOOC teaching system based on TG algorithm can provide technical support for the decision-making information of teaching participants and provide early warning information for predicting learning behavior.

关键词： Decision tree algorithm Large scale open online courses Teaching system Taylor series Gini coefficient distributed computing framework

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：