In this paper, a dynamic niching clustering algorithm based on individual-connectedness (DNIC) is proposed for unsupervised classification with no prior knowledge. It aims to automatically evolve the optimal number of...
详细信息
In this paper, a dynamic niching clustering algorithm based on individual-connectedness (DNIC) is proposed for unsupervised classification with no prior knowledge. It aims to automatically evolve the optimal number of clusters as well as the cluster centers of the data set based on the proposed adaptive compact k -distance neighborhood algorithm. More specifically, with the adaptive selection of the number of the nearest neighbor and the individual-connectedness algorithm, DNIC often achieves several sets of connecting individuals and each set composes an independent niche. In practice, each set of connecting individuals corresponds to a homogeneous cluster and this ensures the separability of an arbitrary data set theoretically. An application of the DNIC clustering algorithm in color image segmentation is also provided. Experimental results demonstrate that the DNIC clustering algorithm has high performance and flexibility.
Recognizing computer users' handedness provides important clues for profiling computer criminals in digital forensic analysis. Existing technologies for handedness recognition have two main problems that limit the...
详细信息
ISBN:
(纸本)9781509032068
Recognizing computer users' handedness provides important clues for profiling computer criminals in digital forensic analysis. Existing technologies for handedness recognition have two main problems that limit their applicability in the scenario of digital crimes: they can be intrusive, and they require costly equipment. Our solution is to infer users' handedness by analyzing keystroke-typing behavior. Field studies are first conducted to gather users' keystroke-typing data during their interaction with computers. Timing features are extracted to characterize users' typing rhythms, and the correlation between keystroke features and handedness is analyzed. Classification techniques are then developed for handedness recognition. Experimental results show that the handedness could be efficiently and accurately inferred from users' keystroke-typing behavior, with recognition rates expressed by the Area Under the ROC Curve (AUC) of 87.75%. To our knowledge, this is the first work that infers users' handedness based on their keystroke-typing biometric during interaction with computers, without dedicated and explicit actions that require attention from users.
Collective measurements on identically prepared quantum systems can extract more information than local measurements, thereby enhancing information-processing efficiency. Although this nonclassical phenomenon has been...
详细信息
Recommender systems have been widely used to deal with information overload, by suggesting relevant items that match users' personal interest. One of the most popular recommendation techniques is matrix factorizat...
详细信息
ISBN:
(纸本)9781509013296
Recommender systems have been widely used to deal with information overload, by suggesting relevant items that match users' personal interest. One of the most popular recommendation techniques is matrix factorization (MF). The inner products of learned latent factors between users and items can estimate users' preferences for items with high accuracy, but the preferences ranking is time consuming. Thus, hashing-based fast search technologies were exploited in recommender systems. However, most previous approaches consist of two stages: continuous latent factor learning and binary quantization, but they didn't well deal with the change of inner product arising from quantization. To this end, in this paper, we propose a constraint free preference preserving hashing method, which quantizes both norm and similarity in dot product. We also design an algorithm to optimize the bit length for norm quantization. The performance of our method is evaluated on three real world datasets. The results confirm that the proposed model can improve recommendation performance by 11%-15%, as compared with the state-of-the-art hashing approaches.
Patch-level features are essential for achieving good performance in computer vision tasks. Besides well- known pre-defined patch-level descriptors such as scalein- variant feature transform (SIFT) and histogram of ...
详细信息
Patch-level features are essential for achieving good performance in computer vision tasks. Besides well- known pre-defined patch-level descriptors such as scalein- variant feature transform (SIFT) and histogram of oriented gradient (HOG), the kernel descriptor (KD) method [1] of- fers a new way to "grow-up" features from a match-kernel defined over image patch pairs using kernel principal compo- nent analysis (KPCA) and yields impressive results. In this paper, we present efficient kernel descriptor (EKD) and efficient hierarchical kernel descriptor (EHKD), which are built upon incomplete Cholesky decomposition. EKD au- tomatically selects a small number of pivot features for gener- ating patch-level features to achieve better computational effi- ciency. EHKD recursively applies EKD to form image-level features layer-by-layer. Perhaps due to parsimony, we find surprisingly that the EKD and EHKD approaches achieved competitive results on several public datasets compared with other state-of-the-art methods, at an improved efficiency over KD.
data sparsity is a long-standing challenge for recommender systems based on collaborative filtering. A promising solution for this problem is multi-context recommendation, i.e., leveraging users' explicit or impli...
详细信息
Canonical correlation analysis(CCA) is a popular technique that works for finding the correlation between two sets of variables. However, CCA faces the problem of small sample size in dealing with high dimensional dat...
详细信息
ISBN:
(纸本)9781509003914
Canonical correlation analysis(CCA) is a popular technique that works for finding the correlation between two sets of variables. However, CCA faces the problem of small sample size in dealing with high dimensional data. Several approaches have been proposed to overcome this issue, but the resulting transformation matrix fails to extract shared structures among data samples. In this paper, we propose trace norm regularized CCA(SRCCA) that not only tackles the problem of small sample size but also uncover the underlying structures between target classes. Specifically, our formulation characterizes the intrinsic dimensionality of a transformation matrix owing to the appealing property of trace norm. Evaluations over public data sets deliver the effectiveness of our algorithm.
Erasure coding has been increasingly used by distributed storage systems to maintain fault tolerance with low storage redundancy. However, how to enhance the performance of degraded reads in erasure-coded storage has ...
详细信息
ISBN:
(纸本)9781509035144
Erasure coding has been increasingly used by distributed storage systems to maintain fault tolerance with low storage redundancy. However, how to enhance the performance of degraded reads in erasure-coded storage has been a critical issue. We revisit this problem from two different perspectives that are neglected by existing studies: data placement and encoding rules. To this end, we propose an encoding-aware data placement (EDP) approach that aims to reduce the number of I/Os in degraded reads during a single failure for general XOR-based erasure codes. EDP carefully places sequential data based on the encoding rules of the given erasure code. Trace-driven evaluation results show that compared to two baseline data placement methods, EDP reduces up to 37.4% of read data on the most loaded disk and shortens up to 15.4% of read time.
暂无评论