Context. The census of open clusters in the Milky Way is in a never-before seen state of flux. Recent works have reported hundreds of new open clusters thanks to the incredible astrometric quality of the Gaia satellit...
详细信息
clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce,...
详细信息
Packet-pair bandwidth probing in wired-cum-wireless network paths was tested and analyzed in a C++ simulation environment using link models verified alongside Opnet results. Some major differences were noted between t...
详细信息
Packet-pair bandwidth probing in wired-cum-wireless network paths was tested and analyzed in a C++ simulation environment using link models verified alongside Opnet results. Some major differences were noted between these results and those of pure wired scenarios investigated in earlier work. Attempts were made to use a dynamic Gaussian-mix algorithm to identify data clusters within the bandwidth distribution.
Data analysis is the emerging research field that relies on methods and techniques to make insights on the data sets. Data analysis on student's academic Performance based on their Health status such as nutritious...
详细信息
Data analysis is the emerging research field that relies on methods and techniques to make insights on the data sets. Data analysis on student's academic Performance based on their Health status such as nutritious food intake, hygienic life style and frequency of health issues is the main objective of the research. The datasets were obtained by Questionnaire method and the analysis were carried out initially with clustering algorithms such as K-means algorithm, Hierarchical clustering and EM Method. In the second phase, Genetic search was performed and the outputs were generated. The statistical output representation for the important attributes are given using orange software. The algorithmic Experimental setup was also carried out with weka datamining tool on student's dataset that has 113 instances and 93 attributes. The findings of the research work were that K-means algorithm outperformed well when compared with EM method and Hierarchical clustering. Genetic search method predicted correlated attributes for the selected class attribute and the outputs are generated. The statistical data analysis shows that nutrition and health issues of female students has an impact on the academic performance of students.
Amount and diversity of data produced and processed has been dramatically increased parallel to improvements in technology. Unfortunately produced data usually don't have any labels which may make the classificati...
详细信息
Amount and diversity of data produced and processed has been dramatically increased parallel to improvements in technology. Unfortunately produced data usually don't have any labels which may make the classification and building information process more easily. This resulted with higher importance on data clustering for builing information. In this work K-Means, Spectral clustering and Girvan-Newman algorithms has been studied and compared on Breaast Cancer Wisconsin Data Set (BCWDS).
clustering is the identification of similar data from a rough or scaled or transformed data and grouping into clusters. Cluster shows symmetry and asymmetry of data and its relations. In this paper, comparisons of thr...
详细信息
clustering is the identification of similar data from a rough or scaled or transformed data and grouping into clusters. Cluster shows symmetry and asymmetry of data and its relations. In this paper, comparisons of three fuzzy clustering algorithms and two conventional clustering algorithms are represented. The analysis is conducted on four datasets which include three gene expression datasets. Here, clustering performance is evaluated using both internal and external validation measurements and an attempt for searching the optimum number of the cluster has taken. This analysis provides an effective way of selecting a suitable algorithm for a particular dataset among different hardcore and soft-core clustering approaches.
The constant growth in urbanization is a cause of significant social and economical transformations in urban areas. Areas where crime rates are above the normal level, are known as crime hot-spots. The increase in urb...
详细信息
ISBN:
(数字)9781728149707
ISBN:
(纸本)9781728149714
The constant growth in urbanization is a cause of significant social and economical transformations in urban areas. Areas where crime rates are above the normal level, are known as crime hot-spots. The increase in urban population is posing challenges related to the management, services and safety from criminal activities. It is important to keep an eye on criminal activities and for the law enforcement agencies, being able to provide much needed safety of public is an increasingly complex task. This complex task can be handled by new technologies which can help these agencies to effectively analyze and understand the different crime trends and patterns with respect to their geographic locations. This paper uses Hierarchical Density-based spatial clustering of applications with noise (HDBSCAN) to find spatio-temporal crime hot-spots by clustering and the results shows that this technique outperforms others.
In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, which is based on the...
详细信息
In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, which is based on the Minimal Description Length (MDL) principle. In fact this proposed measure, called SEE (Semantic Evaluation by Exploration), is an improvement of the existing evaluation methods such as Rand Index or Normalized Mutual Information. It fixes some of weaknesses of the original methods. We illustrate the proposed evaluation method on the freely accessible biomedical research articles from Pubmed Central (PMC). Many articles from Pubmed Central are annotated by the experts using Medical Subject Headings (MeSH) thesaurus. This paper is a part of the research on designing and developing a dialog-based semantic search engine for SONCA system which is a part of the SYNAT project. We compare different semantic techniques for search result clustering using the proposed measure.
Among the power system corrective controls, defensive islanding is considered as the last resort to secure the system from severe cascading contingencies. The primary motive of defensive islanding is to limit the affe...
详细信息
ISBN:
(纸本)9781509033591
Among the power system corrective controls, defensive islanding is considered as the last resort to secure the system from severe cascading contingencies. The primary motive of defensive islanding is to limit the affected areas to maintain the stability of the resulting subsystems and to reduce the total loss of load in the system. The slow coherency based islanding can successfully be applied for the defensive islanding. In this paper, two partitioning methods are proposed, K-means clustering algorithm and fuzzy relational eigenvector centrality-based clustering algorithm. The proposed methods are using the data measured by phasor measurement units to determine the islands to be used in the defensive islanding. The proposed methods are demonstrated on the 16-generator 68-bus power system and their performances are discussed as their results are compared.
We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method compris...
详细信息
We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises generating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering algorithms are used. We implemented two tools to do this: ESTSim (EST simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified parameters, and ECLEST (evaluator for clusterings of ESTs), which computes and evaluates a clustering of a set of input ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be specified independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statistically significant results from this study comparing subword-based dissimilarity measures to alignment-based ones.
暂无评论