检索结果-内蒙古大学图书馆

5th International Conference on Information Science and Applications (ICISA)

作者： Kong, Yun Hee Park, Young B. Baekseok Univ Dept Informat & Commun Cheonan Si Chungcheongnam South Korea Dankook Univ Dept Comp Sci Yongin 330714 Gyeonggi Do South Korea

ISBN: (纸本)9781479944415

According to data volumes in scientific applications have grown exponentially, new scientific methods to analyze and organize the data are required. mapreduce programming is driving Internet services and those services operation in a cloud environment. Hence it is required to efficiently provide resources for handling diverse mapreduce applications. In this paper we show the Hadoop application with map and reduce functions for the data transformation

关键词： mapreduce programming model hadoop performance evaluation

来源：评论

学校读者我要写书评

暂无评论

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences 22nd

A Novel Sequential Pattern Mining Algorithm for Large Scale ...

引用

22nd International Conference on Computational Science and its Applications (ICCSA)

作者： Can, Ali Burak Uzun-Per, Meryem Aktas, Mehmet S. Akdeniz PE TUR AS BiletBank Res & Dev Ctr Istanbul Turkey Istanbul Hlth & Technol Univ Comp Engn Dept Istanbul Turkey Yildiz Tech Univ Comp Engn Dept Istanbul Turkey

ISBN: (纸本)9783031105364;9783031105357

Sequential pattern mining algorithms are unsupervised machine learning algorithms that allow finding sequential patterns on data sequences that have been put together based on a particular order. These algorithms are mostly optimized for finding sequential data sequences containing more than one element. Hence, we argue that there is a need for algorithms that are particularly optimized for data sequences that contain only one element. Within the scope of this research, we study the design and development of a novel algorithm that is optimized for data sets containing data sequences with single elements and that can detect sequential patterns with high performance. The time and memory requirements of the proposed algorithm are examined experimentally. The results show that the proposed algorithm has low running times, while it has the same accuracy results as the algorithms in the similar category in the literature. The obtained results are promising.

关键词： Sequential pattern mining GSP PrefixSpan Large scale data sequences mapreduce programming model

来源：评论

学校读者我要写书评

暂无评论

BigDataSDNSim: A simulator for analyzing big data applications in software-defined cloud data centers

引用

SOFTWARE-PRACTICE & EXPERIENCE 2021年第5期51卷 893-920页

作者： Alwasel, Khaled Calheiros, Rodrigo N. Garg, Saurabh Buyya, Rajkumar Pathan, Mukaddim Georgakopoulos, Dimitrios Ranjan, Rajiv Newcastle Univ Sch Comp Newcastle Upon Tyne Tyne & Wear England Saudi Elect Univ Coll Comp & Informat Riyadh Saudi Arabia Western Sydney Univ Sch Comp Data & Math Sci Sydney NSW Australia Univ Tasmania Sch Comp & Informat Syst Hobart Tas Australia Univ Melbourne Sch Comp & Informat Syst Melbourne Vic Australia Telstra Corp Ltd Melbourne Vic Australia Swinburne Univ Technol Sch Software & Elect Engn Melbourne Vic Australia

The integration and crosscoordination of big data processing and software-defined networking (SDN) are vital for improving the performance of big data applications. Various approaches for combining big data and SDN have been investigated by both industry and academia. However, empirical evaluations of solutions that combine big data processing and SDN are extremely costly and complicated. To address the problem of effective evaluation of solutions that combine big data processing with SDN, we present a new, self-contained simulation tool named BigDataSDNSim that enables the modeling and simulation of the big data management system YARN, its related programming models mapreduce, and SDN-enabled networks in a cloud computing environment. BigDataSDNSim supports cost-effective and easy to conduct experimentation in a controllable, repeatable, and configurable manner. The article illustrates the simulation accuracy and correctness of BigDataSDNSim by comparing the behavior and results of a real environment that combines big data processing and SDN with an equivalent simulated environment. Finally, the article presents two uses cases of BigDataSDNSim, which exhibit its practicality and features, illustrate the impact of data replication mechanisms of mapreduce in Hadoop YARN, and show the superiority of SDN over traditional networks to improve the performance of mapreduce applications.

关键词： big data joint‐ optimization mapreduce programming model modeling and simulation performance optimization software‐ defined networking

来源：评论

学校读者我要写书评

暂无评论

Design of Wireless Sensor Network Data Acquisition System via Health Sensor Based on Symmetric Encryption Algorithm

引用

JOURNAL OF TESTING AND EVALUATION 2023年第1期51卷 278-290页

作者： Xuan, Chunqing Zhengzhou Business Univ Coll Informat & Elect Engn Dept Comp Sci & Technol 136 Zijing Rd Zhengzhou 451200 Peoples R China

In order to improve the data collection effect of the wireless sensor network, a data collection system based on symmetric encryption algorithm is designed via health sensor. Upload the received data to the host via RS-232 to get the working mode and clock activity. The data acquisition circuit is designed with MSP430 module. The mapreduce programming model is used to complete data collection, a symmetric encryption algorithm is introduced, and a range data encryption query scheme with privacy protection function is designed. Apply it to the node data of the wireless sensor network to realize the secure data collection of the wireless sensor network. Experimental results show that the system has the advantages of high efficiency, large amount of data collection, and high residual energy of sensor network nodes.

关键词： symmetric encryption algorithm wireless sensor network data acquisition MSP430 analog-to-digital conversion module mapreduce programming model kinematic algorithm

来源：评论

学校读者我要写书评

暂无评论

Extraction of mapreduce-based features from spectrograms for audio-based surveillance

引用

DIGITAL SIGNAL PROCESSING 2019年 87卷 1-9页

作者： Mulimani, Manjunath Koolagudi, Shashidhar G. Natl Inst Technol Karnataka Dept Comp Sci & Engn Surathkal 575025 India

In this paper, we proposed a novel parallel method for extraction of significant information from spectrograms using mapreduce programming model for the audio-based surveillance system, which effectively recognizes critical acoustic events in the surrounding environment. Extraction of reliable information as features from spectrograms of big noisy audio event dataset demands high computational time. Parallelizing the feature extraction using mapreduce programming model on Hadoop improves the efficiency of the overall system. The acoustic events with real-time background noise from Mivia lab audio event data set are used for surveillance applications. The proposed approach is time efficient and achieves high performance of recognizing critical acoustic events with the average recognition rate of 96.5% in different noisy conditions. (C) 2019 Elsevier Inc. All rights reserved.

关键词： Acoustic Event Classification (AEC) Audio-based surveillance mapreduce programming model Hadoop

来源：评论

学校读者我要写书评

暂无评论

Research on mapreduce Parallel Optimization Method Based on Improved K-means Clustering Algorithm

Research on MapReduce Parallel Optimization Method Based on ...

引用

作者： Ye Xiong Qingyu Peng Zhenhang Zhang School of Automation & Electrical Engineering Lanzhou Jiaotong University Construction Second Division CMCU Engineering Co. Ltd. School of Electrical Engineering Chongqing University

The traditional K-means clustering algorithm occupies a large quantity of memory resources and computing costs when dealing with massive data. It is easy to be restricted by something such as the initial center point as well abnormal data, and usually can not achieve effective clustering of large-scale data. In order to effectively solve the limitations of the algorithm, we propose a mapreduce parallel optimization method based on improved K-means clustering algorithm. Firstly, differential evolution theory is introduced to determine the optimal initial clustering center, after that, on the basis of the influence of samples on clustering results, the corresponding weighted Euclidean distance is designed to achieve effective data differentiation, so as to effectively reduce the impact of samples on clustering *** negative effect of abnormal data on clustering analysis can improve the accuracy of clustering. Finally, mapreduce programming model is used to realize parallel clustering. We use UCI datasets to verify the parallel optimization method. From the experimental results we can clearly know that the method we proposed has relatively stable parallel clustering results, faster operation speed, and effectively saves the operation time.

关键词： K-means clustering algorithm mapreduce programming model Differential evolution algorithm UCI dataset Initial clustering center Weighted Euclidean distance

来源：评论

学校读者我要写书评

暂无评论

Performance Improvement of mapreduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

引用

WIRELESS PERSONAL COMMUNICATIONS 2017年第3期95卷 2709-2733页

作者： Benifa, J. V. Bibal Dejey Anna Univ Dept Comp Sci & Engn Reg Campus Tirunelveli 627007 Tamil Nadu India

mapreduce is a parallel programming model for processing the data-intensive applications in a cloud environment. The scheduler greatly influences the performance of mapreduce model while utilized in heterogeneous cluster environment. The dynamic nature of cluster environment and computing workloads affect the execution time and computational resource usage in the scheduling process. Further, data locality is essential for reducing total job execution time, cross-rack communication, and to improve the throughput. In the present work, a scheduling strategy named efficient locality and replica aware scheduling (ELRAS) integrated with an autonomous replication scheme (ARS) is proposed to enhance the data locality and performs consistently in the heterogeneous environment. ARS autonomously decides the data object to be replicated by considering its popularity and removes the replica as it is idle. The proposed approach is validated in a heterogeneous cluster environment with various realistic applications that are IO bound, CPU bound and mixed workloads. ELRAS improves the throughput by a factor about 2 as compared with the existing FIFO and it also yields near optimal data locality, reduce the execution time, and effective utilization of resources. The simplicity of ELRAS algorithm proves its feasibility to adopt for a wide range of applications.

关键词： mapreduce programming model Data locality Heterogeneous clusters Virtualization

来源：评论

学校读者我要写书评

暂无评论

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2017年第1期28卷 101-114页

作者： Xun, Yaling Zhang, Jifu Qin, Xiao Zhao, Xujun Taiyuan Univ Sci & Technol Taiyuan 030024 Shanxi Peoples R China Auburn Univ Dept Comp Sci & Software Engn Samuel Ginn Coll Engn Auburn AL 36849 USA

Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of computing nodes. We start this study by discovering a serious performance problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning approach called FiDoop-DP using the mapreduce programming model. The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data partitioning technique, which exploits correlations among transactions. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data partition to improve locality without creating an excessive number of redundant transactions. We implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is conducive to reducing network and computing loads by the virtue of eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly improves the performance of the existing parallel frequent-pattern scheme by up to 31 percent with an average of 18 percent.

关键词： Frequent itemset mining parallel data mining data partitioning mapreduce programming model hadoop cluster

来源：评论

学校读者我要写书评

暂无评论

MapFIM: Memory Aware Parallelized Frequent Itemset Mining in Very Large Datasets 1

引用

28th International Conference on Database and Expert Systems Applications (DEXA)

作者： Duong, Khanh-Chuong Bamha, Mostafa Giacometti, Arnaud Li, Dominique Soulet, Arnaud Vrain, Christel Univ Francois Rabelais Tours LI EA 6300 Blois France Univ Orleans INSA Ctr Val de Loire LIFO EA 4022 Blois France

ISBN: (数字)9783319644684

ISBN: (纸本)9783319644684;9783319644677

Mining frequent itemsets in large datasets has received much attention, in recent years, relying on mapreduce programming models. Many famous FIM algorithms have been parallelized in a mapreduce framework like Parallel Apriori, Parallel FP-Growth and Dist-Eclat. However, most papers focus on work partitioning and/or load balancing but they are not extensible because they require some memory assumptions. A challenge in designing parallel FIM algorithms is thus finding ways to guarantee that data structures used during mining always fit in the local memory of the processing nodes during all computation steps. In this paper, we propose MapFIM, a two-phase approach for frequent itemset mining in very large datasets relying both on a mapreduce-based distributed Apriori method and a local in-memory method. In our approach, mapreduce is first used to generate local memory-fitted prefix-projected databases from the input dataset benefiting from the Apriori principle. Then an optimized local in-memory mining process is launched to generate all frequent itemsets from each prefix-projected database. Performance evaluation shows that MapFIM is more efficient and more extensible than existing mapreduce based frequent itemset mining approaches.

关键词： Frequent itemset mining mapreduce programming model Distributed file systems Hadoop framework

来源：评论

学校读者我要写书评

暂无评论

Detecting Text Similarity Using mapreduce Framework

引用

Europe, Middle East and North Africa Conference on Technology and Security to Support Learning (EMENA-TSSL)

作者： Birjali, Marouane Beni-Hssane, Abderrahim Erritali, Mohammed Madani, Youness Univ Chouaib Doukkali Fac Sci Dept Comp Sci LAROSERI Lab El Jadida Morocco Univ Sultan Moulay Slimane Fac Sci & Technol Dept Comp Sci TIAD Lab Beni Mellal Morocco

ISBN: (纸本)9783319465685;9783319465678

The evaluation of similarities between textual documents was regarded as a subject of research strongly recommended in various domains. There are many of documents in a large amount of corpus. Most of them are required to check the similarity for validation. In this paper, we propose a new mapreduce algorithm of document similarity measures. Then we study the state of the art of different approaches for computing the similarity of amount documents to choose the approach that will be used in our mapreduce algorithm. Therefore, we present how the similarity between terms is used in the assessment of the similarity between documents. Simulation results, on Hadoop framework, show that our mapreduce algorithm outperforms classical ones in term of running time.

关键词： Hadoop cluster Document similarity mapreduce programming model Similarity measure

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：