Recommender systems are evolving as an essential part of every industry with no exception to travel and tourism segment. Considering the exponential increase in social media usage and huge volume of data being generat...
详细信息
ISBN:
(纸本)9781538673362
Recommender systems are evolving as an essential part of every industry with no exception to travel and tourism segment. Considering the exponential increase in social media usage and huge volume of data being generated through this channel, it can be considered as a vital source of input data for modern recommender systems. This in turn resulted in the need of efficient and effective mechanisms for contextualized information retrieval. Traditional recommender systems adopt collaborative filtering techniques to deal with social context. However they turn out to be computational intensive and thereby less scalable with internet and social media as input channel. A possible solution is to adopt clustering techniques to limit the data to be considered for recommendation process. In tourism context, based on social media interactions like reviews, forums, blogs, feedbacks, etc. travelers can be clustered to form different interest groups. This experimental analysis aims at comparing key clustering algorithms with the aim of finding an optimal option that can be adopted in tourism domain by applying social media datasets from travel and tourism context.
The identification of topics in Social Networks has become an important research task when dealing with event detection, particularly when global communities are affected. Text processing techniques and machine learni...
详细信息
ISBN:
(纸本)9781467384186
The identification of topics in Social Networks has become an important research task when dealing with event detection, particularly when global communities are affected. Text processing techniques and machine learning algorithms have been extensively used to solve this problem. In this paper we compare three clustering algorithms - k-means, k-medoids and NMF (Non-negative Matrix Factorization) - in order to detect topics related to textual messages obtained from Twitter. The algorithms were applied to a database composed by tweets, having as initial context hashtags that are related to the recent scandal of corruption involving FIFA (International Federation of Football Association). Obtained results suggest that the NMF presents better results, since it provides providing clusters that are easier to interpret.
Expressed sequence tags (ESTs) are short single pass sequence reads derived from cDNA libraries, they have been used for gene discovery, detection of splice variants, expression of genes and also transciptome analysis...
详细信息
ISBN:
(纸本)9781424453306
Expressed sequence tags (ESTs) are short single pass sequence reads derived from cDNA libraries, they have been used for gene discovery, detection of splice variants, expression of genes and also transciptome analysis. clustering of ESTs is a vital step before they can be processed further. Currently there are many EST clustering algorithms available. Basically they can be generalized into two broad approaches, i.e. alignment-based and alignment-free. The former approach is reliable but inefficient in terms of running time, while the latter approach is gaining popularity and currently under rapid development due to its faster speed and acceptable result. In this paper, we propose a taxonomy for sequence comparison algorithms and another taxonomy for EST clustering algorithms. In addition, we also highlight the peculiarities of recently introduced alignment-free EST clustering algorithms by focusing on their features, distance measures, advantages and disadvantages.
With the development of sequencing technologies, more and more protein sequences are uncharacterized. clustering protein sequences into homologous groups can help to annotate uncharacterized protein sequences. In rece...
详细信息
ISBN:
(纸本)9781424438655
With the development of sequencing technologies, more and more protein sequences are uncharacterized. clustering protein sequences into homologous groups can help to annotate uncharacterized protein sequences. In recent years, many clustering algorithms have been proposed to analyze protein sequences. It may be necessary to perform a comparative study of these algorithms, and help biologists to choose suitable clustering algorithm for their tasks. In this work, we present a comparative experiment on three clustering algorithms: BlastClust, Spectral clustering, and TribeMCL. We conducted two types of experiment for each algorithm :(1) Default parameters experiment;(2) Parameters tuning. The results of evaluation uncover that TribeMCL outperform the other methods. BlastClust is extremely dependent on the selection of parameters values.
Many cluster validity measures have been proposed up to now, and it is realized that no universally best measure exists. In this paper we propose kernelized validity measures where a kernel means the kernel function u...
详细信息
ISBN:
(纸本)9780780394889
Many cluster validity measures have been proposed up to now, and it is realized that no universally best measure exists. In this paper we propose kernelized validity measures where a kernel means the kernel function used in support vector machines. Two measures are considered: one is the sum of the traces of the fuzzy covariances within clusters. Why we consider the trace instead of the determinant is that the calculation of the determinant will be ill-posed when kernelized, while the trace is sound and easily computed. The second is a kernelized Xie-Beni's measure. These two measures are applied to the determination of the number of clusters having nonlinear boundaries generated by kernelized clustering algorithms. Another application of the measures is the evaluation of robustness of different algorithms with respect to variations of initial values and changes of a parameter.
This paper investigates the problem of accuracy of localization with GNSS in constraint environments. The ultimate goal is to provide a first confidence index on the accuracy of the position given by the GNSS. In this...
详细信息
ISBN:
(纸本)9781457721977
This paper investigates the problem of accuracy of localization with GNSS in constraint environments. The ultimate goal is to provide a first confidence index on the accuracy of the position given by the GNSS. In this paper, we propose to use the complementarity between the GNSS signals and the development in image processing to count satellites with direct reception state. It consists to use a vehicle equipped with a GPS-RTK and a camera oriented upwards to capture images and count after repositioning, the satellites with direct signals (resp. with blocked/reflected signals) i.e. located in the sky region of the image (resp. located in the not-sky region). The proposed approach is based on an optimal clustering applied on simplified images. More preciously, the acquired image is simplified using a geodesic reconstruction with an optimal contrast parameter. Then, a clustering step is made in order to classify the regions into two classes (sky and not-sky). For that, a set of unsupervised (KMlocal, Fuzzy C-means, Fisher and Statistical region Merging) and supervised (Bayes, K-Nearest Neighbor and Support Vector Machine) clustering algorithms are compared in order to define the best classifier in terms of good classification rate and processing time. Experimental results are shown for hundred images taken in different conditions of acquisition (illumination changes, clouds, sun, tunnels, etc).
Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature...
详细信息
ISBN:
(纸本)9783642137990
Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function "works best" has been investigated, but no final conclusion has been reached. The aim of this paper is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the "gold solution" is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.
In this paper two clustering algorithms DBSCAN and CLARA were applied over the pedological database of Montenegro. Both algorithms clusterize data based on their density distribution. DBSCAN enables discovering cluste...
详细信息
ISBN:
(纸本)9781728117393
In this paper two clustering algorithms DBSCAN and CLARA were applied over the pedological database of Montenegro. Both algorithms clusterize data based on their density distribution. DBSCAN enables discovering clusters of arbitary shapes, without domain knowledge. On the other hand, CLARA forms clusters of approximatly equal size and shape for databases with uniformly spaced data. The used databases is composed of chemical and mechanical-physical parameters of soil samples. There are no clear transitions between different types of soil and large differences in values of their parameters at the boundary points of the clusters. Thus, CLARA is proved to be better for clustering pedologic data, which is confirmed by means of simulations. The results obtained by the CLARA are comparable with the results obtained by the analysis of soil in Montenegro by the expert.
Type designers and historians studying the typefaces and fonts used in historical documents can usually only rely on available printed material. The initial wooden or metal cast fonts have mostly disappeared. In this ...
详细信息
ISBN:
(纸本)9781479918058
Type designers and historians studying the typefaces and fonts used in historical documents can usually only rely on available printed material. The initial wooden or metal cast fonts have mostly disappeared. In this paper we address the creation of character templates from printed documents. Images of characters scanned from Renaissance era documents are segmented, then clustered. A template is created from each obtained cluster of similar appearance characters. In order for subsequent typeface analysis tools to operate, the template should reduce the noise present in the individual instances by using information from the set of samples, but the samples must be homogeneous enough to not introduce further noise into the process. This paper evaluates the efficiency of several clustering algorithms and the associated parameters through cluster validity statistics and appearance of the resulting template image. clustering algorithms that form tight clusters produce templates that highlight details, even though the number of available samples is smaller, while algorithms with larger clusters better capture the global shape of the characters.
clustering - the grouping of similar objects - is one of the fundamental tasks in the field of data analysis and Data Mining. The list of applied areas where it is applied is wide: image segmentation, marketing, fraud...
详细信息
ISBN:
(纸本)9788993215151
clustering - the grouping of similar objects - is one of the fundamental tasks in the field of data analysis and Data Mining. The list of applied areas where it is applied is wide: image segmentation, marketing, fraud prevention, forecasting, text analysis and many others. At the present stage, clustering is often the first step in analyzing data. After the selection of similar groups, other methods are used, for each group a separate model is built. The task of clustering in one form or another was formulated in such scientific areas as statistics, pattern recognition, optimization, machine learning. Hence the variety of synonyms for the concept of cluster-class, taxon, condensation. At the moment, the number of methods for splitting groups of objects into clusters is quite large - several dozen algorithms and even more of their modifications. However, we are interested in committee analysis of clustering algorithms, which based on selection of centroids.
暂无评论