Incomplete multi-view clustering (IMC) has achieved widespread attention due to its advantage in fusing the multi-view information when the view samples are unobserved partly. Recently, it is shown that the clustering...
详细信息
Incomplete multi-view clustering (IMC) has achieved widespread attention due to its advantage in fusing the multi-view information when the view samples are unobserved partly. Recently, it is shown that the clustering performance in the subspace can be improved by preserving the clustering structure of each view, but the problem of the inconsistent clustering structure caused by the incomplete graphs are seldom considered, restricting the clustering performance. Motivated by the clustering interpretation of the orthogonal non-negative matrix factorization, it is employed to unify the clustering structure of the data, and a new model called Incomplete Graph-regularized orthogonal non-negative matrix factorization (IGONMF) is proposed in this paper. In IGONMF, the reproduced representation is developed, based on which, a set of incomplete graphs are utilized to fully take advantage of the geometric structure of the data. And the orthogonality is further employed to alleviate the problem of the inconsistent clustering structure. Also, we design an effective iterative updating algorithm to solve the proposed model, along with its analysis on the convergence and the computational cost. Finally, experimental results on several real-world datasets indicate that our method is superior to the related state-of-the-art methods.
Partitioning a given data-set into subsets based on similarity among the data is called clustering. Clustering is a major task in data mining and machine learning having many applications such as text retrieval, patte...
详细信息
Partitioning a given data-set into subsets based on similarity among the data is called clustering. Clustering is a major task in data mining and machine learning having many applications such as text retrieval, pattern recognition, and web mining. Here, we briefly review some clustering related problems (k-means, normalized k-cut, orthogonal non-negative matrix factorization, ONMF, and isoperimetry) and describe their connections. We formulate the relaxed mean version of the isoperimetry problem as an optimization problem with non-negativeorthogonal constraints. We first make use of a gradient-based optimization algorithm to solve this kind of a problem, and then apply a post-processing technique to extract a solution of the clustering problem. Also, we propose a simplified approach to improve upon solution of the 2-dimensional clustering problem, using the N-nearest neighbor graph. Inspired by this technique, we apply a multilevel method for clustering a given data-set to reduce the size of the problem by grouping a number of similar vertices. The number is determined based on two values, namely, the maximum and the average of the edge weights of the vertices connected to a selected vertex. In addition, using the connections between ONMF and k-means and between k-means and the isoperimetry problem, we propose an algorithm to solve the ONMF problem. A comparative performance analysis of our approach with other related methods shows outperformance of our approach, in terms of the obtained misclassification error rate and Rand index, on both benchmark and randomly generated problems as well as hard synthetic data-sets.
Data clustering is a key problem in data science and machine learning. In this paper, we consider orthogonal non-negative matrix factorization (ONMF) for scaled data clustering. The non-convex orthogonality constraint...
详细信息
ISBN:
(纸本)9789082797060
Data clustering is a key problem in data science and machine learning. In this paper, we consider orthogonal non-negative matrix factorization (ONMF) for scaled data clustering. The non-convex orthogonality constraint of ONMF raises a great challenge from an optimization viewpoint. We study a convex-constrained transformation of ONMF that allows us to control the approximation accuracy and problem difficulty through a parameter. We then apply a homotopy strategy in which we trace the solution path of a sequence of the aforementioned transformed problems, gradually moving from easy problems to near-ONMF problems. Intuitively, doing so may allow us to avoid local minima. Numerical results show that our homotopy method yields competitive clustering performance in synthetic data experiments and in a real-data hyperspectral clustering experiment.
Background and objective: Valvular heart disease (VHD) is associated with elevated mortality rates. Although transthoracic echocardiography (TTE) is the gold standard detection tool, phonocardiography (PCG) could be a...
详细信息
Background and objective: Valvular heart disease (VHD) is associated with elevated mortality rates. Although transthoracic echocardiography (TTE) is the gold standard detection tool, phonocardiography (PCG) could be an alternative as it is a cost-effective and noninvasive method for cardiac auscultation. Many researchers have dedicated their efforts to improving the decision-making process and developing robust and precise approaches to assist physicians in providing reliable diagnoses of ***: This research proposes a novel approach for the detection of anomalous valvular heart sounds from PCG signals. The proposed approach combines orthogonal non-negative matrix factorization (ONMF) and convolutional neural network (CNN) architectures in a three-stage cascade. The aim of the proposal is to improve the learning process by identifying the optimal ONMF temporal or spectral patterns for accurate detection. In the first stage, the time-frequency representation of the input PCG signal is computed. Next, band-pass filtering is performed to locate the spectral range that is most relevant for the presence of such cardiac abnormalities. In the second stage, the temporal and spectral cardiac structures are extracted using the ONMF approach. These structures are utilized in the third stage and fed into the CNN architecture to detect abnormal heart ***: Several state-of-the-art CNN architectures, such as LeNet5, AlexNet, ResNet50, VGG16 and GoogLeNet, have been evaluated to determine the effectiveness of using ONMF temporal features for VHD detection. The results reveal that the integration of ONMF temporal features with a CNN classifier significantly improve VHD detection. Specifically, the proposed approach achieves an accuracy improvement of approximately 45% when ONMF spectral features are used and 35% when time-frequency features from the short-time Fourier transform (STFT) spectrogram are used. Additionally, feeding ONMF temporal features into low-compl
Semi-supervised multi-view learning has recently achieved appealing performance with the consensus relation between samples. However, in addition to the relation between samples, the relation between samples and their...
详细信息
Semi-supervised multi-view learning has recently achieved appealing performance with the consensus relation between samples. However, in addition to the relation between samples, the relation between samples and their assemble centroid is also important to the learning. In this paper, we propose a novel model based on orthogonal non-negative matrix factorization, which allows exploring both the consensus relations between samples and between samples and their assemble centroid. Since this model utilizes more consensus information to guide the multi-view learning, it can lead to better performance. Meanwhile, we theoretically derive a proposition about the equivalency between the partial orthogonality and the full orthogonality. Based on this proposition, the orthogonality constraint and the label constraint are simultaneously implemented in the proposed model. Experimental evaluations on five real-world datasets show that our approach outperforms the state-of-the-art methods, where the improvement is 6% average in terms of ARI index.
An approach to user identification based on deviations of their topic trends in operation with text information is presented. An approach is proposed to solve this problem;the approach implies topic analysis of the us...
详细信息
An approach to user identification based on deviations of their topic trends in operation with text information is presented. An approach is proposed to solve this problem;the approach implies topic analysis of the user's past trends (behavior) in operation with text content of various (including confidential) categories and forecast of their future behavior. The topic analysis of user's operation implies determining the principal topics of their text content and calculating their respective weights at the given instants. Deviations in the behavior in the user's operation with the content from the forecast are used to identify this user. In the framework of this approach, our own original time series forecasting method is proposed based on orthogonal non-negative matrix factorization (ONMF). Note that ONMF has not been used to solve time series forecasting problems before. The experimental research held on the example of real-world corporate emailing formed out of the Enron data set showed the proposed user identification approach to be applicable.
The non-negativematrixfactorization (NMF) model with an additional orthogonality constraint on one of the factor matrices, called the orthogonal NMF (ONMF), has been found a promising clustering model and can outper...
详细信息
The non-negativematrixfactorization (NMF) model with an additional orthogonality constraint on one of the factor matrices, called the orthogonal NMF (ONMF), has been found a promising clustering model and can outperform the classical K-means. However, solving the ONMF model is a challenging optimization problem because the coupling of the orthogonality and non-negativity constraints introduces a mixed combinatorial aspect into the problem due to the determination of the correct status of the variables (positive or zero). Most of the existing methods directly deal with the orthogonality constraint in its original form via various optimization techniques, but are not scalable for large-scale problems. In this paper, we propose a new ONMF based clustering formulation that equivalently transforms the orthogonality constraint into a set of norm-based non-convex equality constraints. We then apply a non-convex penalty (NCP) approach to add them to the objective as penalty terms, leading to a problem that is efficiently solvable. One smooth penalty formulation and one non-smooth penalty formulation are respectively studied. We build theoretical conditions for the penalized problems to provide feasible stationary solutions to the ONMF based clustering problem, as well as proposing efficient algorithms for solving the penalized problems of the two NCP methods. Experimental results based on both synthetic and real datasets are presented to show that the proposed NCP methods are computationally time efficient, and either match or outperform the existing K-means and ONMF based methods in terms of the clustering performance.
暂无评论