The use of clustering for developing a description of a software system's architecture is fairly recent. Thus there is a need to evaluate various clustering algorithms and identify the ones which are expected to g...
详细信息
This paper is oriented into the text document retrieval area. The aim of the paper is to compare two soft document clustering methods by using neural networks, the modification of KMART and the nonlinear Hebbian neura...
详细信息
In densely deployed wireless sensor networks, spatial data correlations are introduced by the observations of multiple spatially proximal sensor nodes on a same phenomenon or event. These correlations bring significan...
详细信息
In densely deployed wireless sensor networks, spatial data correlations are introduced by the observations of multiple spatially proximal sensor nodes on a same phenomenon or event. These correlations bring significant potential advantages for the development of efficient strategies for reducing energy consumption. In this paper, spatial data correlations are exploited to group sensor nodes into clusters of high data aggregation efficiency. We define the problem of selecting the set of cluster heads as the weighted connected dominating set problem. Then we develop a set of centralized and distributed algorithms to select the cluster heads. Simulation results demonstrate the effectiveness and efficiency of the designed algorithms.
We consider efficient communication schemes based on both network-supported and application-level multicast techniques for content-based publication-subscription systems. We show that the communication costs depend he...
详细信息
We consider efficient communication schemes based on both network-supported and application-level multicast techniques for content-based publication-subscription systems. We show that the communication costs depend heavily on the network configurations, distribution of publications and subscriptions. We devise new algorithms and adapt existing partitional data clustering algorithms. These algorithms can be used to determine multicast groups with as much commonality as possible, based on the totality of subscribers' interests. They perform well in the context of highly heterogeneous subscriptions, and they also scale well. An efficiency of 60% to 80% with respect to the ideal solution can be achieved with a small number of multicast groups (less than 100 in our experiments). Some of these same concepts can be applied to match publications to subscribers in real-time, and also to determine dynamically whether to unicast, multicast or broadcast information about the events over the network to the matched subscribers. We demonstrate the quality of our algorithms via simulation experiments.
In this paper, we reflect on a problem of audio editing detection. We consider a certain type of editing - the insertion of falsified fragments into an audio recording. To detect such insertions, we propose an algorit...
详细信息
ISBN:
(数字)9798350372939
ISBN:
(纸本)9798350372946
In this paper, we reflect on a problem of audio editing detection. We consider a certain type of editing - the insertion of falsified fragments into an audio recording. To detect such insertions, we propose an algorithm based on audio background noise clustering. For the proposed algorithm, various methods for background noise preprocessing, various neural feature extractors and various clustering algorithms are examined. Based on the experimental research, we select a combination of a preprocessing method, a feature extraction model and a clustering algorithm that gives the best clustering metrics. The combination of ResNet feature extractor and agglomerative clustering yielded the best results.
This paper investigates how clustering algorithms and Recency, Frequency, and Monetary value (RFM) analysis can be performed on online transactions to provide strategies for customer purchasing behaviors. Along with p...
详细信息
ISBN:
(数字)9781728196565
ISBN:
(纸本)9781728196572
This paper investigates how clustering algorithms and Recency, Frequency, and Monetary value (RFM) analysis can be performed on online transactions to provide strategies for customer purchasing behaviors. Along with performing RFM analysis on the retail dataset, clustering algorithms such as Mean-shift, Density-Based Spatial clustering of Applications with Noise (DBSCAN), Agglomerative clustering, and K-Means were utilized. By comparing these clustering algorithms, we have found valuable customer groups based on RFM values.
A perceptual image hash function maps an image to a short binary string, based on an image's appearance to the human eye. Perceptual image hashing is useful in image databases, watermarking, and authentication. In...
详细信息
A perceptual image hash function maps an image to a short binary string, based on an image's appearance to the human eye. Perceptual image hashing is useful in image databases, watermarking, and authentication. In this paper, we decouple image hashing into feature extraction (intermediate hash) followed by data clustering (final hash). For any perceptually significant feature extractor, we propose a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion. We prove that the decision version of our clustering problem is NP complete, Based on the proposed algorithm, we develop two variations to facilitate perceptual robustness vs. fragility trade-offs. We test the proposed algorithms against Stirmark attacks.
With the rapid growth of geographic data, generated by various sensors and end equipment, new opportunities for research and practical applications can be found in various applications. However, effective utilization ...
详细信息
ISBN:
(数字)9798331520861
ISBN:
(纸本)9798331520878
With the rapid growth of geographic data, generated by various sensors and end equipment, new opportunities for research and practical applications can be found in various applications. However, effective utilization of this data often requires the division of geospatial space into smaller, manageable regions. An important challenge is to ensure that these regions with closed data points are balanced in terms of data size distribution (e.g., population density, resource allocation, etc.), creating a double optimization problem. The contributions of this paper are twofold. First, we propose a balance-driven partitioning algorithm, which is a coordinate-descent based algorithm using a dynamic programming technique. Second, we present a clustering-centric algorithm that improves the classic k-means algorithm with an imbalance-penalized function to allow the geographic data to be clustered together not only in terms of geographic location, but also in terms of the per-cluster total sizes in balance. Finally, to evaluate the efficiency of the proposed algorithms, we conducted experiments based on a trace geographic dataset and compared the results with those of the existing clustering algorithms. Our results demonstrate that the proposed algorithms can not only achieve the competitive clustering effects but also exhibit better performance in terms of data-size balance.
Topology control is one of the most important parts of Wireless Sensor Networks (WSNs) which is the current hotspot of research and application. Comparing with the other traditional wireless networks, this paper first...
详细信息
ISBN:
(纸本)9781424458721;9781424458745
Topology control is one of the most important parts of Wireless Sensor Networks (WSNs) which is the current hotspot of research and application. Comparing with the other traditional wireless networks, this paper firstly summarizes the specialties of typic WSNs, and then studies the recent representative clustering algorithms in this area by summing up their characteristics and application areas, posting their limitations, and pointing out the future trend of the clustering Arithmetic of WSNs emphatically.
In this paper, we analyze some clustering algorithms that have been widely employed in the past to support the comprehension of Web applications. To this end, we have defined an approach to identify static pages that ...
详细信息
In this paper, we analyze some clustering algorithms that have been widely employed in the past to support the comprehension of Web applications. To this end, we have defined an approach to identify static pages that are duplicated or cloned at the content level. This approach is based on a process that first computes the dissimilarity between Web pages using latent semantic indexing, a well known information retrieval technique, and then groups similar pages using clustering algorithms. We consider five instances of this process, each based on three variants of the agglomerative hierarchical clustering algorithm, a divisive clustering algorithm, k-means partitional clustering algorithm, and a widely employed partitional competitive clustering algorithm, namely Winner Takes All. In order to assess the proposed approach, we have used the static pages of three Web applications and one static Web site.
暂无评论