Hadoop MapReduce has been proven an effective computing model to deal with bigdata for the last few years. However, one technical challenge facing this framework is how to predict the execution time of an individual ...
详细信息
Erasure coding has been increasingly used by distributed storage systems to maintain fault tolerance with low storage redundancy. However, how to enhance the performance of degraded reads in erasure-coded storage has ...
详细信息
ISBN:
(纸本)9781509035144
Erasure coding has been increasingly used by distributed storage systems to maintain fault tolerance with low storage redundancy. However, how to enhance the performance of degraded reads in erasure-coded storage has been a critical issue. We revisit this problem from two different perspectives that are neglected by existing studies: data placement and encoding rules. To this end, we propose an encoding-aware data placement (EDP) approach that aims to reduce the number of I/Os in degraded reads during a single failure for general XOR-based erasure codes. EDP carefully places sequential data based on the encoding rules of the given erasure code. Trace-driven evaluation results show that compared to two baseline data placement methods, EDP reduces up to 37.4% of read data on the most loaded disk and shortens up to 15.4% of read time.
In this paper, we propose a novel model for unsupervised segmentation of viewer's attention object from natural images based on localizing region-based active con-tour (LRAC). Firstly, we proposed the saliency det...
详细信息
In the current social background,Existing facilities and tools already can not meet the needs of bigdata in expanding and analysis ***'s data storage and analysis work is achieved under cloud conditions and Hadoo...
详细信息
ISBN:
(纸本)9789462520677
In the current social background,Existing facilities and tools already can not meet the needs of bigdata in expanding and analysis ***'s data storage and analysis work is achieved under cloud conditions and Hadoop were set *** the conditions for cloud computing,cloud computing applications who have remote data files were not authorized to read its contents,which results in unauthorized manipulation,and it will produce a lot of security risks for large *** this paper,according to the cloud of different modes,Hadoop different stages of the operation,subject to the threat of non- confidence and security to steal bigdata generated to analyze a variety of privacy,with threat model as an example,it explores ways to address security threats.
In this paper, a Price learning based Load Distribution Strategy (PLDS) is proposed at first. In PLDS model, Smart Power Service, Utility Company and History Load Curves are included, and by considering both the avera...
详细信息
Human saccade is a dynamic process of information pursuit. There are many methods using either global context or local context cues to model human saccadic scan-paths. In contrast to them, this paper introduces a mode...
详细信息
Human saccade is a dynamic process of information pursuit. There are many methods using either global context or local context cues to model human saccadic scan-paths. In contrast to them, this paper introduces a model for gaze movement control using both global and local cues. To test the performance of this model, an experiment is done to collect human eye movement data by using an SMI iVIEW X Hi-Speed eye tracker with a sampling rate of 1250 Hz. The experiment used a two-by-four mixed design with the location of the targets and the four initial positions. We compare the saccadic scan-paths generated by the proposed model against human eye movement data on a face benchmark dataset. Experimental results demonstrate that the simulated scan-paths by the proposed model are similar to human saccades in term of the fixation order, Hausdorff distance, and prediction accuracy for both static fixation locations and dynamic scan-paths.
Time series clustering is widely applied in various areas. Existing researches focus mainly on distance measures between two time series, such as dynamic time warping (DTW) based methods, edit-distance based methods...
详细信息
Time series clustering is widely applied in various areas. Existing researches focus mainly on distance measures between two time series, such as dynamic time warping (DTW) based methods, edit-distance based methods, and shapelets-based methods. In this work, we experimentally demonstrate, for the first time, that no single distance measure performs significantly better than others on clustering datasets of time series where spectral clustering is used. As such, a question arises as to how to choose an appropriate measure for a given dataset of time series. To answer this question, we propose an integration scheme that incorporates multiple distance measures using semi-supervised clustering. Our approach is able to integrate all the measures by extracting valuable underlying information for the clustering. To the best of our knowledge, this work demonstrates for the first time that the semi-supervised clustering method based on constraints is able to enhance time series clustering by combining multiple distance measures. Having tested on clustering various time series datasets, we show that our method outperforms individual measures, as well as typical integration approaches.
In neighborhood rough set model, the majority rule based neighborhood classifier (NC) is easy to be misjudged with the increasing of the size of information granules. To remedy this deficiency, we propose a neighborho...
详细信息
ISBN:
(纸本)9781509003914
In neighborhood rough set model, the majority rule based neighborhood classifier (NC) is easy to be misjudged with the increasing of the size of information granules. To remedy this deficiency, we propose a neighborhood collaborative classifier (NCC) based on the idea of collaborative representation based classification (CRC). NCC first performs feature selection with neighborhood rough set, and then instead of solving the classification problem by the majority rule, NCC solves a similar problem with collaborative representation among the neighbors of each unseen sample. Experiments on UCI data sets demonstrate that: 1) Our NCC achieves satisfactory performance in larger neighborhood information granules when compared with NC; 2) NCC reduces the size of dictionary when compared with CRC.
作者:
Xiaodong WuFaculty of Mathematics and Computer Science
Quanzhou Normal University Fujian Provincial Key Laboratory of Data Intensive Computing Key Laboratory of Intelligent Computing and Information Processing Fujian Province University
The MapReduce parallel and distributed computing framework has been widely applied in both academia and industry. MapReduce applications are divided into two steps: Map and Reduce. Then, the input data is divided into...
详细信息
ISBN:
(纸本)9781467383134
The MapReduce parallel and distributed computing framework has been widely applied in both academia and industry. MapReduce applications are divided into two steps: Map and Reduce. Then, the input data is divided into splits, which can be concurrently processed, and the amount of the splits determines the number of map tasks. In this paper, we present a regression-based method to compute the number of Map tasks as well as Reduce tasks such that the performance of the MapReduce application can be improved. The regression analysis is used to predict the executing time of MapReduce applications. Experimental results show that the proposed optimization method can effectively reduce the execution time of the applications.
With the rapid growth of data volume, knowledge acquisition for bigdata has become a new challenge. To address this issue, the hierarchical decision table is defined and implemented in this work. The properties of di...
详细信息
ISBN:
(纸本)9781467372220
With the rapid growth of data volume, knowledge acquisition for bigdata has become a new challenge. To address this issue, the hierarchical decision table is defined and implemented in this work. The properties of different hierarchical decision tables are discussed under the different granularity of conditional attributes. A novel knowledge acquisition algorithm for bigdata using MapReduce is proposed. Experimental results demonstrate that the proposed algorithm is able to deal with bigdata and mine hierarchical decision rules under the different granularity.
暂无评论