Failure prediction for hard disk drives is a typical and effective approach to improve the reliability of storage systems. In a large-scale data center environment, the various brands and models of drives serve divers...
详细信息
Failure prediction for hard disk drives is a typical and effective approach to improve the reliability of storage systems. In a large-scale data center environment, the various brands and models of drives serve diverse applications with different input/output workload patterns, and non-ignorable differences exist in each type of drive failures, which make this mechanism much challenging. Although many efforts are devoted to this mechanism, the accuracy still needs to be improved. In this article, we propose a failure prediction method for hard disk drives based on a part-voting random forest, which differentiates prediction of failures in a coarse-grained manner. We conduct groups of validation experiments on two real-world datasets, which contain the SMART data of 64,193 drives. The experimental results show that our proposed method can achieve a better prediction accuracy than state-of-the-art methods.
The traditional KNN query is a kind of algorithm with good stability and accuracy performance. However, when the sample size is too large, the computational efficiency of the algorithm is affected greatly. Therefore, ...
详细信息
The traditional KNN query is a kind of algorithm with good stability and accuracy performance. However, when the sample size is too large, the computational efficiency of the algorithm is affected greatly. Therefore, a kind of parallel MKNN text classification algorithm based on clustering center text series has been proposed. Firstly, the effective dimensionality reduction of similarity calculation amount of the algorithm is realized based on the clustering center, and the original large-scale document samples are replaced with a relatively small number of clustering sample centers to realize improvement of the KNN query process. Secondly, MapReduce parallel framework is used to meet real-time demand of large-scale text classification and calculation combined with features of text classification, and to effectively overcome slow speed of the KNN query process and ensure accuracy of text classification as higher as possible. Finally, the classification speed of proposed algorithm can be effectively improved under the premise of ensuring sufficient accuracy through comparison in experiment of text classification accuracy and algorithmic efficiency with the similar single-threaded algorithm. (C) 2017 Elsevier B.V. All rights reserved.
Partition of networks into optimal set of clusters is the prominent technique to prolong the network lifetime of energy constrained wireless sensor networks. Enumeration search method cannot find optimal clusters with...
详细信息
Partition of networks into optimal set of clusters is the prominent technique to prolong the network lifetime of energy constrained wireless sensor networks. Enumeration search method cannot find optimal clusters within polynomial bounded time for large scale networks since the computational complexity of problem grows exponentially with the dimension of networks. Optimal cluster configuration in sensor networks is known to be Non-deterministic Polynomial (NP)-hard optimization problem and for that reason we have applied polynomial time metaheuristic algorithms to find optimal or near-optimal solutions. In this paper, we present clustering algorithms based on Simulated Annealing (SA) and Particle Swarm Optimization (PSO) to find optimal set of cluster heads in the network. The optimization problem consists of finding optimal configuration of clusters such that the communication distance per cluster is not only minimized but the cluster balance and energy efficiency is also maintained in the network. The SA and PSO toolboxes are developed in C++ and integrated with OMNeT++ simulation environment to implement the proposed clustering algorithms. The performance of algorithms with respect to network lifetime, load balance and energy efficiency of network is examined in the simulation.
Dynamic VAR compensation (DVC) is gaining increasing attentions with growing construction of high-voltage DC systems and renewable power plants in modern power system. This study proposes a novel zoning-based heuristi...
详细信息
Dynamic VAR compensation (DVC) is gaining increasing attentions with growing construction of high-voltage DC systems and renewable power plants in modern power system. This study proposes a novel zoning-based heuristic planning method to reduce the sites and capacity of DVC installation. clustering algorithm based on buses' dynamic behaviour correlation and voltage control ability is developed to partition the system into several control zones. Then, candidate DVC sites are determined by zones, which narrow the searching space and avoid excessive installation of DVCs on adjacent buses. The voltage control index is proposed to find proper DVC installation locations by zones and capacity optimisation is then carried out to reduce investment. The proposed methodology is demonstrated on a modified New England 39-bus test system with wind farm integration. Performance comparison with conventional planning scheme confirms the advantages of the new heuristic planning scheme in practical applications.
In this paper we propose a robust clustering algorithm for interval data. The proposed method is based on similarity measure that is not necessary to specify a cluster number and initials. Several numerical examples d...
详细信息
ISBN:
(纸本)9781467315074
In this paper we propose a robust clustering algorithm for interval data. The proposed method is based on similarity measure that is not necessary to specify a cluster number and initials. Several numerical examples demonstrate the effectiveness of the proposed robust clustering algorithm. We then apply this algorithm to the real data set with cities temperature interval data. The proposed clustering algorithm actually presents its robustness.
Set Pair Analysis (SPA) is a new methodology to describe and process uncertainty system, which has been applied in many fields recently. In this paper, a new approach to remote sensing information extraction, the SPA-...
详细信息
ISBN:
(纸本)9781467311601
Set Pair Analysis (SPA) is a new methodology to describe and process uncertainty system, which has been applied in many fields recently. In this paper, a new approach to remote sensing information extraction, the SPA-based k-means clustering algorithm (SPAKM), has been proposed based on the principle of SPA. The basic ideals and steps of SPAKM are discussed. The proposed algorithm can overcome the limitation of K-means clustering algorithm to certain extent. Finally, cluster analysis experiments of LANDSAT TM image have been made. The results show that the improved K-means clustering algorithm is superior to K-means in classification accuracy of land cover classes of mixed pixels.
Challenges in time series classification has attracted attention in the past decade. Although large amounts of labeled data are assumed to be available, in reality, labeled data might be scarce to find in many domains...
详细信息
Challenges in time series classification has attracted attention in the past decade. Although large amounts of labeled data are assumed to be available, in reality, labeled data might be scarce to find in many domains. In this paper, we propose an online semi-supervised multi-channel classifier for time series based on growing neural gas (GNG) learning scheme. The method is able to handle multi-channel time series with variation in dimensions and it introduces a label prediction strategy to minimize misclassification. It measures the similarity of input instance and learned templates using weighted multi-channel dynamic time warping technique and learns the topology of input data space specified for each class using the GNG learning algorithm. Comprehensive evaluation is conducted using various datasets, such as gesture recognition, human activity recognition, and human daily-life activity recognition. Experimental results demonstrate good classification results, with indication that the proposed approach requires only a handful of labeled instances to construct an accurate classification model.
The AUTOSAR has been developed as the worldwide standard for automotive E/E software systems, making the electronic components of different suppliers to be employed universally. However, as the number of component-bas...
详细信息
The AUTOSAR has been developed as the worldwide standard for automotive E/E software systems, making the electronic components of different suppliers to be employed universally. However, as the number of component-based applications in modern automotive embedded systems grows rapidly and the hardware topology becomes increasingly complex, deploying such large number of components in automotive distributed system in manual way is over-dependent on experience of engineers which in turn is time consuming. Furthermore, the resource limitation and scheduling analysis make the problems more complex for developers to find out an approximate optimal deploying approach in system integration. In this paper, we propose a novel method to deploy the AUTOSAR components onto ECUs with the following features. First, a clustering algorithm is designed for deploying components automatically within relatively low time complexity. Second, a fitness function is designed to balance the ECUs load. The goal of our approach is to minimize the communication cost over all the runnable entities while meeting all corresponding timing constraints and balancing all the ECUs load. The experiment results show that our approach is efficient and has well performance by comparing with other existing methods in specific and synthetic data set.
Noise maps are considered a powerful tool for determining the population exposure to environmental noise. To make the process of updating noise maps easier, more cost effective and more frequent, there is a need for i...
详细信息
Noise maps are considered a powerful tool for determining the population exposure to environmental noise. To make the process of updating noise maps easier, more cost effective and more frequent, there is a need for integrated systems that combine real-time measurement and processing to assess the acoustic impact of noise sources. To this end, a dedicated project, named Dynamic Acoustic Mapping (DYNAMAP), has been proposed and co-financed in the framework of the Financial Instrument for the Environment (LIFE) 2013 program with the aim of developing a dynamic noise mapping system capable of detecting and representing in real-time the acoustic impact of road infrastructures. Noise maps are updated by scaling the noise levels of pre-calculated noise maps as functions of the differences observed between measured and calculated original grid data. The total map is updated by energetic summation of single source levels from updated noise maps. Given the large number of roads present in Milan city, obtaining the dynamics acoustics map of this city requires application of a statistical approach where the roads having similar flow conditions and thus similar noise trends are grouped (clustered) together. In order to obtain these groups (clusters), an extensive measurement campaign was executed. The maps obtained using this method can be associated with an error that will depend on the chosen integration time of noise levels. Results show that two statistical clusters differentiated by rush hour traffic flow are sufficient and better for categorization than the road types provided by Italian road regulation. (C) 2016 Elsevier Ltd. All rights reserved.
To enhance the safety and efficiency of civil aviation, special attentions should be paid to pilot's physical and mental health. Existing works used video monitoring and social network mining to find the potential...
详细信息
To enhance the safety and efficiency of civil aviation, special attentions should be paid to pilot's physical and mental health. Existing works used video monitoring and social network mining to find the potential anomalies in pilot's daily life. However, video monitoring suffers from the privacy problems and social network mining is computational complex. To solve the problems of existing works, we propose a novel pilot anomaly detection method using step-sensors. The key idea of this method is that the pilots step information reflects their daily behaviors, and it is also influenced by the behaviors of the pilots social networks;if a pilot step number is extremely different from his historical step numbers or the step numbers of his social networks, this would probably be an anomaly. We, therefore, use the step-sensor to collect pilots step information and use the cluster method to detect anomalies. Experiments are held on 65 pilot candidates, which are divided into two social groups. We collect their step information during 50 days. Using our proposed anomaly detection method, outliers can be successfully detected for further analysis. Our method is also free of privacy problem and is highly efficient.
暂无评论