The number of online purchases is increasing constantly. Companies have recognized the related opportunities and they are using online channels progressively. In order to acquire potential customers, companies often t...
详细信息
ISBN:
(纸本)9781538619964
The number of online purchases is increasing constantly. Companies have recognized the related opportunities and they are using online channels progressively. In order to acquire potential customers, companies often try to gain a better understanding through the use of web analytics. One of the most useful sources are web log files. Basically, these provide an abundance of important information about the user behavior on a website, such as the path or access time. Mining this so-called clickstream data in the most comprehensive way has become an important task in order to predict the behavior of online customers, optimize webpages, and give personalized recommendations. As the number of customers constantly rises, the volume of the generated data log files also increases, both in terms of size and quantity. Thus, for certain companies, the currently used technologies are no longer sufficient. In this work, a comprehensive workflow will be proposed using a clustering algorithm in a Hadoop ecosystem to investigate user interest patterns. The complete workflow will be demonstrated on an application scenario of one of the largest business-to-business (B2B) electronic commerce websites in Germany. Furthermore, an experimental evaluation method will be applied to verify the applicability and efficiency of the used algorithm, along with the associated framework.
An unauthorized activity on the network is called network intrusion and device or software application which monitors the network parameters in order to detect such an intrusion is called network intrusion detection s...
详细信息
An unauthorized activity on the network is called network intrusion and device or software application which monitors the network parameters in order to detect such an intrusion is called network intrusion detection system (NIDS). With high rise in malicious activities on the internet, it is extremely important for NIDS to quickly and correctly identify any kind of malicious activity on the network. Moreover, the system must refrain from raising false alarms in case of normal usage detected as malicious. This paper proposes use of machine learning classification algorithms - XGBoost and AdaBoost with and without clustering to train a model for NIDS. The models are trained and tested using NSL KDD dataset and the results are an improvement over the previous works related to intrusion detection on the same dataset.
Structural health monitoring (SHM) involves the development of strategies to assess the condition of instrumented engineering structures. One of the most critical applications of SHM systems is civil infrastructure. F...
详细信息
ISBN:
(纸本)9781538611043
Structural health monitoring (SHM) involves the development of strategies to assess the condition of instrumented engineering structures. One of the most critical applications of SHM systems is civil infrastructure. For this application, it is particularly important that SHM systems be inexpensive and easy to deploy, since the maintenance of infrastructure is often inadequately funded. Wireless sensor networks (WSN) can be very useful toward this end. We present an efficient WSN-based SHM algorithm for detecting, localizing, and monitoring the progression of damage in infrastructure applications. The algorithm utilizes a novel vibration-based pattern matching technique that is very well suited for low-power WSN nodes. During a training phase, a body of reference patterns is formed from vibrations observed at sensor nodes distributed throughout the structure. During the operational phase, observed patterns are compared to the reference patterns to determine if a match exists. Through the use of an innovative distributed algorithm, a time complexity of O(logN) is achieved for the matching process. If a match does not exist, potential damage is indicated and the reference pattern closest to the observed pattern is determined using Euclidean distance. The difference between the two patterns indicates the sensor nodes at which potential damage exists. Clusters are then formed around these sensor nodes in order to monitor the progression of local damage. Simulations are performed in MATLAB for a typical bridge deployment in order to determine the degree of overlapping that occurs as clusters are generated in response to potential damage. The simulations indicate that overlapping increases gracefully as the number of nodes experiencing damage increases.
As data mining having attracted a significant amount of research attention, many clustering algorithms have been proposed in the past decades. However, most of existing clustering methods have high computational time ...
详细信息
As data mining having attracted a significant amount of research attention, many clustering algorithms have been proposed in the past decades. However, most of existing clustering methods have high computational time or are not suitable for discovering clusters with non-convex shape. In this paper, an efficient clustering algorithm CHSMST is proposed, which is based on clustering based on hyper surface (CHS) and minimum spanning tree. In the first step, CHSMST applies CHS to obtain initial clusters immediately. Thereafter, minimum spanning tree is introduced to handle locally dense data which is hard for CHS to deal with. The experiments show that CHSMST can discover clusters with arbitrary shape. Moreover, CHSMST is insensitive to the order of input samples and the run time of the algorithm increases moderately as the scale of dataset becomes large.
Fuzzy clustering is superior to crisp clustering when the boundaries among the clusters are vague and ambiguous. However, the main limitation of both fuzzy and crisp clustering algorithms is their sensitivity to the n...
详细信息
Fuzzy clustering is superior to crisp clustering when the boundaries among the clusters are vague and ambiguous. However, the main limitation of both fuzzy and crisp clustering algorithms is their sensitivity to the number of potential clusters and/or their initial positions. Moreover, the comprehensibility of obtained clusters is not expertized, whereupon in data-mining applications, the discovered knowledge is not understandable for human users. To overcome these restrictions, a novel fuzzy rule-based clustering algorithm (FRBC) is proposed in this paper. Like fuzzy rule-based classifiers, the FRBC employs a supervised classification approach to do the unsupervised cluster analysis. It tries to automatically explore the potential clusters in the data patterns and identify them with some interpretable fuzzy rules. Simultaneous classification of data patterns with these fuzzy rules can reveal the actual boundaries of the clusters. To illustrate the capability of FRBC to explore the clusters in data, the experimental results on some benchmark datasets are obtained and compared with other fuzzy clustering algorithms. The clusters specified by fuzzy rules are human understandable with acceptable accuracy.
Objective For low-voltage current transformer surface crack detection,traditional methods can not effectively distinguish cracks and scratches problem,crack detection method is proposed based on geometrical features a...
详细信息
ISBN:
(纸本)9781509046584
Objective For low-voltage current transformer surface crack detection,traditional methods can not effectively distinguish cracks and scratches problem,crack detection method is proposed based on geometrical features and Moment *** Extraction algorithm by osmosis from the gray image of the target area,according to the crack and scratches different texture features,the use of geometric features and invariant moments,determine the characteristic parameters threshold,and finally using clustering algorithm to determine the threshold determination cracks and scratches to be *** After tests proved that the method can effectively distinguish cracks and scratches,and to solve the noise problem on low-voltage current transformer crack *** Compared with the traditional object of cracks and scratches detection methods,the method proposed in this paper has the mathematical property of invariant to rotation,translation and size of image,and it is also used to detect the crack image in the moving state.
The cooperative relay network exploits the space diversity gain by allowing cooperation among users to improve transmission quality. It is an important issue to identify the cluster-head (or relay node) and its member...
详细信息
The cooperative relay network exploits the space diversity gain by allowing cooperation among users to improve transmission quality. It is an important issue to identify the cluster-head (or relay node) and its members who are to cooperate. The cluster-head consumes more battery power than an ordinary node since it has extra responsibilities, i.e., ensuring the cooperation of its members' transmissions;thereby the cluster-head has a lower throughput than the average. Since users are joining or departing the clusters from time to time, the network topology is changing and the network may not be stable. Flow to balance the fairness among users and the network stability is a very interesting topic. This paper proposes an adaptive weighted clustering algorithm (AWCA), in which the weight factors are introduced to adaptively control both the stability and fairness according to the number of arrival users. It is shown that when the number of arrival users is large, AWCA has the life time longer than FWCA and similar to SWCA and that when the number of arrival users is small, AWCA provides fairness higher than SWCA and close to FWCA.
A hybrid intelligent model comprising a modified fuzzy min-max (FMM) clustering neural network and a modified clustering tree (CT) is developed. A review of clustering models with rule extraction capabilities is prese...
详细信息
A hybrid intelligent model comprising a modified fuzzy min-max (FMM) clustering neural network and a modified clustering tree (CT) is developed. A review of clustering models with rule extraction capabilities is presented. The hybrid FMM-CT model is explained. We first use several benchmark problems to illustrate the cluster evolution patterns from the proposed modifications in FMM. Then, we employ a case study with real data related to power quality monitoring to assess the usefulness of FMM-CT. The results are compared with those from other clustering models. More importantly, we extract explanatory rules from FMM-CT to justify its predictions. The empirical findings indicate the usefulness of the proposed model in tackling data clustering and power quality monitoring problems under different environments.
clustering plays an important role in discovering underlying patterns of data points according to their similarities. Many advanced algorithms have difficulty when dealing with variable clusters. In this paper, we pro...
详细信息
clustering plays an important role in discovering underlying patterns of data points according to their similarities. Many advanced algorithms have difficulty when dealing with variable clusters. In this paper, we propose a simple but effective clustering algorithm, CLUB. First, CLUB finds initial clusters based on mutual k nearest neighbours. Next, taking the initial clusters as input, it identifies the density backbones of clusters based on k nearest neighbours. Then, it yields final clusters by assigning each unlabelled point to the cluster which the unlabelled point's nearest higher-density-neighbour belongs to. To comprehensively demonstrate the performance of CLUB, we benchmark CLUB with six baselines including three classical and three state-of-the-art methods, on nine two-dimensional various-sized datasets containing clusters with various shapes and densities, as well as seven widely-used multi-dimensional datasets. In addition, we also use Olivetti Face dataset to illustrate the effectiveness of our method on face recognition. Experimental results indicate that CLUB outperforms the six compared algorithms in most cases. (C) 2016 Elsevier Ltd. All rights reserved.
The K-mean clustering algorithm was employed for processing signal waveforms from TIBr detectors. The signal waveforms were classified based on its shape reflecting the charge collection process in the detector. The c...
详细信息
The K-mean clustering algorithm was employed for processing signal waveforms from TIBr detectors. The signal waveforms were classified based on its shape reflecting the charge collection process in the detector. The classified signal waveforms were processed individually to suppress the pulse height variation of signals due to the charge collection loss. The obtained energy resolution of a Cs-137 spectrum measured with a 0.5 mm thick TIBr detector was 1.3% FWHM by employing 500 clusters. (C) 2011 Elsevier B.V. All rights reserved.
暂无评论