Traditional clustering method is a kind of unsupervised learning, which is widely used in practical applications. However, the actual acquired data contains a part of prior information, that is the label of some data ...
详细信息
Traditional clustering method is a kind of unsupervised learning, which is widely used in practical applications. However, the actual acquired data contains a part of prior information, that is the label of some data is known or the relationship of some pairs of data is known. The clustering method using this information is semi-supervised clustering. The pairwise constraints information is a kind of commonly used prior information, including must-link constraints and cannot-link constraints. Compared with unsupervised clustering algorithms, semi-supervised clustering algorithms have better clustering performance due to the guidance of prior information. nonnegative matrix factorization (NMF) is an efficient clustering method, but it is an unsupervised method and can not take advantage of pairwise constraints information. To this end, by combining pairwise constraints information with NMF framework, a semi-supervised nonnegative matrix factorization with pairwise constraints (SNMFPC) is proposed in this paper. SNMFPC requires that the low-dimensional representations satisfy these constraints, that is, a pair of must-link data should be close to each other, and a pair of cannot-link data is as distant as possible to each other. Experiments are carried out on several data sets and compared with some semi-supervised methods. The validity of the proposed method is verified.
Sparse nonnegative matrix factorization (SNMF) is a fundamental unsupervised representation learning technique, and it represents low-dimensional features of a data set and lends itself to a clustering interpretation....
详细信息
Sparse nonnegative matrix factorization (SNMF) is a fundamental unsupervised representation learning technique, and it represents low-dimensional features of a data set and lends itself to a clustering interpretation. However, the model and algorithm of SNMF have some shortcomings. In this work, we created a clustering method by improving the SNMF model and its Alternating Direction Multiplier Method acceleration algorithm. A novel, fast and closed-form iterative solution is proposed for SNMF with implicit sparse constraints which are L- 1 and L-2 norms of the coefficient and basis matrixes, respectively. A low-dimensional feature space is also proposed as result of the closed-form iteration formats of each sub-problem obtained by variable splitting. In addition, the convergence points of the presented iterative algorithms are stationary points of the model. Finally, numerical experiments show that the improved algorithm is comparable to the sate-of-the-art methods in data clustering. (C) 2022 Elsevier B.V. All rights reserved.
In this paper, a non-negative matrixfactorization feature expansion (NMFFE) approach was proposed to overcome the feature-sparsity issue when expanding features of short-text. First, we took the internal relationship...
详细信息
In this paper, a non-negative matrixfactorization feature expansion (NMFFE) approach was proposed to overcome the feature-sparsity issue when expanding features of short-text. First, we took the internal relationships of short texts and words into account when segmenting words from texts and constructing their relationship matrix. Second, we utilized the Dual regularization non-negative matrix tri-factorization (DNMTF) algorithm to obtain the words clustering indicator matrix, which was used to get the feature space by dimensionality reduction methods. Thirdly, words with close relationship were selected out from the feature space and added into the short-text to solve the sparsity issue. The experimental results showed that the accuracy of short text classification of our NMFFE algorithm increased 25.77%, 10.89%, and 1.79% on three data sets: Web snippets, Twitter sports, and AGnews, respectively compared with the Word2Vec algorithm and Char-CNN algorithm. It indicated that the NMFFE algorithm was better than the BOW algorithm and the Char-CNN algorithm in terms of classification accuracy and algorithm robustness.
Hyperspectral unmixing addressing spectral variability remains an important challenge. In this field, unmixing methods do not exploit the possible availability of some spectral information that corresponds to known sp...
详细信息
Hyperspectral unmixing addressing spectral variability remains an important challenge. In this field, unmixing methods do not exploit the possible availability of some spectral information that corresponds to known spectra of some pure materials present in an acquired scene. In this work, a hyperspectral unmixing method, which considers not only the spectral variability phenomenon but also exploits one or more available known pure material spectra, is proposed. Such a combination, initially proposed here, constitutes the originality of the conducted work that distinguishes it from other investigations in the hyperspectral unmixing topic. The proposed method, based on an informed nonnegative matrix factorization technique, employs a partial structured additively-tuned linear mixing model that deals with spectral variability. Experimental results, based on real data, show that the designed informed algorithm, which addresses spectral variability, yields very satisfactory results and outperforms tested literature approaches. Thus, such an unmixing algorithm may be used for automatically detecting and mapping, using hyperspectral data, materials of interest whose spectra are known while dealing with their spectral variability. (C) 2022 Society of Photo-Optical Instrumentation Engineers (SPIE)
Hottopixx, proposed by Bittorf et al. at NIPS 2012, is an algorithm for solving non -negative matrixfactorization (NMF) problems under the separability assumption. Separable NMFs have important applications, such as ...
详细信息
Hottopixx, proposed by Bittorf et al. at NIPS 2012, is an algorithm for solving non -negative matrixfactorization (NMF) problems under the separability assumption. Separable NMFs have important applications, such as topic extraction from documents and unmixing of hyperspec-tral images. In such applications, the robustness of the algorithm to noise is the key to success. Hottopixx has been shown to be robust to noise, and its robustness can be further enhanced through postprocessing. However, there is a drawback. Hottopixx and its postprocessing require us to esti-mate the noise level involved in the matrix we want to factorize before running, since they use it as part of the input data. The noise-level estimation is not an easy task. In this paper, we overcome this drawback. We present a refinement of Hottopixx and its postprocessing that runs without prior knowledge of the noise level. We show that the refinement has almost the same robustness to noise as the original algorithm.
nonnegative matrix factorization is comprehensively used in recommendation systems. In an effort to reduce the recommended cost of newly added samples, incremental nonnegative matrix factorization and its variants hav...
详细信息
nonnegative matrix factorization is comprehensively used in recommendation systems. In an effort to reduce the recommended cost of newly added samples, incremental nonnegative matrix factorization and its variants have been extensively studied in recommendation systems. However, the recommendation performance is incapable of particular applications in terms of data sparsity and sample diversity. In this paper, we propose a new incremental recommend algorithm by improving incremental nonnegative matrix factorization based on three-way decision, called Three-way Decision Recommendations Based on Incremental Non-negative matrixfactorization (3WD-INMF), in which the concept of positive, negative, and boundary regions are employed to update the new coming samples' features. Finally, experiments on six public data sets demonstrate the error induced by 3WD-INMF is decreasing as the addition of new samples and deliver state-of-the-art performance compared with existing recommendation algorithms. The results indicate our method is more reasonable and efficient by leveraging the idea of three-way decision to perform the recommendation decision process.
In the real world, one object is usually described via multiple views or modalities. Many existing multiview clustering methods fuse the information of multiple views by learning a consensus representation. However, t...
详细信息
In the real world, one object is usually described via multiple views or modalities. Many existing multiview clustering methods fuse the information of multiple views by learning a consensus representation. However, the feature learned in this manner is usually redundant and has neglected the distinctions among the different views. Addressing this issue, a method named nonredundancy regularization based nonnegative matrix factorization with manifold learning (NRRNMF-ML) is proposed in the paper. A novel nonredundancy regularizer defined with the Hilbert-Schmidt Independence Criterion (HSIC) is incorporated in the objective function of the proposed method. By minimizing this term, the redundant information among the multiple views can be effectively reduced and the distinct contributions of the different views can be encouraged. To further utilizing manifold structure information of the data, a manifold regularizer is also constructed and included in the objective function of the proposed method. For the proposed method, an iterative optimization strategy was designed to solve the problem;the corresponding proof is presented both theoretically and experimentally in this paper. Experimental results on five multiview data sets compared with several representative multiview clustering methods revealed the effectiveness of the proposed method.
nonnegative matrix factorization (NMF) has become a popular method for establishing a low-dimensional approximation of a two-mode (nonnegative) data matrix and, in some instances, to also establish partitions for the ...
详细信息
nonnegative matrix factorization (NMF) has become a popular method for establishing a low-dimensional approximation of a two-mode (nonnegative) data matrix and, in some instances, to also establish partitions for the objects associated with the two modes of the matrix. Although similar to singular-value-decomposition, as its name implies, NMF requires nonnegative elements for the factors and this assures a 'sum of the parts' fit to the data. There are a variety of alternative objective functions and heuristic methods for NMF. Using both simulated and real-world two-mode data, we demonstrate that a multiple restart (multistart) heuristic for NMF commonly fails to produce optimal objective function values. A new variable neighborhood search (VNS) heuristic is shown to outperform the multistart approach with respect to both solution quality and computation time. Although the objective function value improvements associated with VNS are often small (on a percentage basis), such improvements can sometimes lead to differences in the partitions obtained for the row and column objects. Application to two well-known microarray datasets is used to support the merits of the proposed VNS heuristic.
Circular RNAs (circRNAs) are a class of structurally stable endogenous noncoding RNA molecules. Increasing studies indicate that circRNAs play vital roles in human diseases. However, validating disease-related circRNA...
详细信息
Circular RNAs (circRNAs) are a class of structurally stable endogenous noncoding RNA molecules. Increasing studies indicate that circRNAs play vital roles in human diseases. However, validating disease-related circRNAs in vivo is costly and time-consuming. A reliable and effective computational method to identify circRNA-disease associations deserves further studies. In this study, we propose a computational method called RNMFLP that combines robust nonnegative matrix factorization (RNMF) and label propagation algorithm (LP) to predict circRNA-disease associations. First, to reduce the impact of false negative data, the original circRNA-disease adjacency matrix is updated by matrix multiplication using the integrated circRNA similarity and the disease similarity information. Subsequently, the RNMF algorithm is used to obtain the restricted latent space to capture potential circRNA-disease pairs from the association matrix. Finally, the LP algorithm is utilized to predict more accurate circRNA-disease associations from the integrated circRNA similarity network and integrated disease similarity network, respectively. Fivefold cross-validation of four datasets shows that RNMFLP is superior to the state-of-the-art methods. In addition, case studies on lung cancer, hepatocellular carcinoma and colorectal cancer further demonstrate the reliability of our method to discover disease-related circRNAs.
Fast-developing single-cell technologies create unprecedented opportunities to reveal cell heterogeneity and diversity. Accurate classification of single cells is a critical prerequisite for recovering the mechanisms ...
详细信息
Fast-developing single-cell technologies create unprecedented opportunities to reveal cell heterogeneity and diversity. Accurate classification of single cells is a critical prerequisite for recovering the mechanisms of heterogeneity. However, the scRNA-seq profiles we obtained at present have high dimensionality, sparsity, and noise, which pose challenges for existing clustering methods in grouping cells that belong to the same subpopulation based on transcriptomic profiles. Although many computational methods have been proposed developing novel and effective computational methods to accurately identify cell types remains a considerable challenge. We present a new computational framework to identify cell types by integrating low-rank representation (LRR) and nonnegative matrix factorization (NMF);this framework is named NMFLRR. The LRR captures the global properties of original data by using nuclear norms, and a locality constrained graph regularization term is introduced to characterize the data's local geometric information. The similarity matrix and low-dimensional features of data can be simultaneously obtained by applying the alternating direction method of multipliers (ADMM) algorithm to handle each variable alternatively in an iterative way. We finally obtained the predicted cell types by using a spectral algorithm based on the optimized similarity matrix. Nine real scRNA-seq datasets were used to test the performance of NMFLRR and fifteen other competitive methods, and the accuracy and robustness of the simulation results suggest the NMFLRR is a promising algorithm for the classification of single cells. The simulation code is freely available at: https://***/wzhangwhu/NMFLRR_code.
暂无评论