—Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabli...
详细信息
Text clustering has been widely used in many Natural Language Processing (NLP) applications such as text summarization and news recommendation. However, most of the current algorithms need to predefine a clustering nu...
详细信息
ISBN:
(数字)9781728109626
ISBN:
(纸本)9781728109633
Text clustering has been widely used in many Natural Language Processing (NLP) applications such as text summarization and news recommendation. However, most of the current algorithms need to predefine a clustering number, which is difficult to obtain. Moreover, the mutli-label clustering is useful in multiple clustering tasks in many applications, but related works are rarely available. Although several studies have attempted to solve above two problems, there is a need for methods that can solve the two issues simultaneously. Therefore, we propose a new text clustering algorithm called Word2Cluster. Word2Cluster can automatically generate an adaptive number of clusters and support multi-label clustering. To test the performance of Wrod2Cluster, we build a Chinese text dataset, Hotline, according to real world applications. To evaluate the clustering results better, we propose an improved evaluation method based on basic accuracy, precision and recall for multi-label text clustering. Experimental results on a Chinese text dataset (Hotline) and a public English text dataset (Reuters) demonstrate that our algorithm can achieve better F1-measure and runs faster than the state-of- the-art baselines.
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from t...
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multicenter study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and post-processing (66%). The “typical” lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.
Linear discriminative analysis (LDA) is an effective feature extraction method for hyperspectral image (HSI) classification. Most of the existing LDA-related methods are based on spectral features, ignoring spatial in...
详细信息
Linear discriminative analysis (LDA) is an effective feature extraction method for hyperspectral image (HSI) classification. Most of the existing LDA-related methods are based on spectral features, ignoring spatial information. Recently, a matrix discriminative analysis (MDA) model has been proposed to incorporate the spatial information into the LDA. However, due to sensor interferers, calibration errors, and other issues, HSIs can be noisy. These corrupted data easily degrade the performance of the MDA. In this paper, a robust MDA (RMDA) model is proposed to address this important issue. Specifically, based on the prior knowledge that the pixels in a small spatial neighborhood of the HSI lie in a low-rank subspace, a denoising model is first employed to recover the intrinsic components from the noisy HSI. Then, the MDA model is used to extract discriminative spatial-spectral features from the recovered components. Besides, different HSIs exhibit different spatial contextual structures, and even a single HSI may contain both large and small homogeneous regions simultaneously. To sufficiently describe these multiscale spatial structures, a multiscale RMDA model is further proposed. Experiments have been conducted using three widely used HSIs, and the obtained results show that the proposed method allows for a significant improvement in the classification performance when compared to other LDA-based methods.
News system requires news classification and personalized recommendation to improve user's efficiency and interest, and to enhance user's experiences. This paper constructed a news automatic classification and...
详细信息
News system requires news classification and personalized recommendation to improve user's efficiency and interest, and to enhance user's experiences. This paper constructed a news automatic classification and recommendation system through natural language processing, text classification, collaborative filtering algorithm. The published news contents were word-segmented and model-trained automatically first to determine which category the news belonging to. Users can also manually modify the classification so that later classification can be updated and improved. After that, the similarity between users was calculated by collaborative filtering and the users having higher similarity with the recommended users were selected. The news seen by the certain users were recommended to the users that were divided into the same group. This paper takes the news corpus of Fudan University's text classification research center as experimental data. Text classification accuracy is tested by this corpus. The experimental results show that the system can serve the news users well. It achieves effective classification and recommendation of news personally.
The Deep Convolutional Neural Networks (CNNs) have obtained a great success for pattern recognition, such as recognizing the texts in images. But existing CNNs based frameworks still have several drawbacks: 1) the tra...
详细信息
In this paper, we extend the popular dictionary pair learning (DPL) into the scenario of twin-projective latent flexible DPL under a structured twin-incoherence. Technically, a novel framework called Twin-Projective L...
In this paper, we extend the popular dictionary pair learning (DPL) into the scenario of twin-projective latent flexible DPL under a structured twin-incoherence. Technically, a novel framework called Twin-Projective Latent Flexible DPL (TP-DPL) is proposed, which minimizes the twin-incoherence constrained flexibly-relaxed reconstruction error to avoid the possible over-fitting issue and produce accurate reconstruction. In this setting, TP-DPL integrates the twin-incoherence based latent flexible DPL and the joint embedding of codes as well as salient features by twin-projection into a unified model in an adaptive neighborhood-preserving manner. Therefore, TP-DPL can unify the procedures of salient feature representation and classification. The twin-incoherence constraint on coefficients and features can explicitly ensure high intra-class compactness and inter-class separation over them. TP-DPL also integrates the adaptive weighting to preserve local neighborhood of both coefficients and salient features within each class explicitly. For efficiency, TP-DPL selects the Frobenius-norm and abandons the costly l0/l1-norm for group sparse representation. Another byproduct is that TP-DPL can directly apply the class-specific twin-projective reconstruction residual to compute the label of data. Extensive results on public databases show that TP-DPL can deliver the state-of-the-art performance.
—The performance of video saliency estimation techniques has achieved significant advances along with the rapid development of Convolutional Neural Networks (CNNs). However, devices like cameras and drones may have l...
详细信息
In recent years, reversible data hiding in encrypted images (RDHEI) that embeds additional data into the encrypted image content has received more and more attention. In previous RDHEI methods, there is no one conside...
详细信息
In many real-world applications, learning a classifier with false-positive rate under a specified tolerance is appealing. Existing approaches either introduce prior knowledge dependent label cost or tune parameters ba...
详细信息
暂无评论