Semantic search is known as a series of activities and techniques to improve the search accuracy by clearly understanding users' search intent. Usually, semantic search engines requires ontology and semantic metad...
详细信息
ISBN:
(纸本)9781467387965
Semantic search is known as a series of activities and techniques to improve the search accuracy by clearly understanding users' search intent. Usually, semantic search engines requires ontology and semantic metadata to analyze user queries. However, building a particular ontology and semantic metadata intended for large amounts of data is a very time-consuming and costly task. In order to resolve this problem, we propose a novel semantic search method that does not require ontologies and semantic metadata by taking advantage of semantically enriched textmodel. Through extensive experiments using the OSHUMED document collection and SCOPUS library data, we show that our proposed method improves users' search satisfaction.
Word-text matrix has been usually used as text representation model in text classification and text clustering. However its high dimension and sparsity reduce its expression ability. For improving its expression abili...
详细信息
ISBN:
(纸本)9781479986965
Word-text matrix has been usually used as text representation model in text classification and text clustering. However its high dimension and sparsity reduce its expression ability. For improving its expression ability, authors mine word word relation and text-text relation, and integrate these semantic relationships into word-text matrix. The classification experiments show that these new representationmodels can improve the classification accuracy of text efficiently as well as represent the text information better.
In this paper we present a novel approach based on efficient textrepresentation which employs semantic relations between words. We use singular value decomposition of the co-occurrence matrix to overcome its noise an...
详细信息
ISBN:
(纸本)9788993215090
In this paper we present a novel approach based on efficient textrepresentation which employs semantic relations between words. We use singular value decomposition of the co-occurrence matrix to overcome its noise and sparseness. Thereby, we obtain a new refined co-occurrence matrix, which allows us to determine relations between words as distances in it. We use these distances as correction factors for the Bag-of-words textrepresentation. In other words, we transform textrepresentation vectors by inclusion relations between words. To validate our representationmodel, we apply it to binary classification task. We study how our model improves classification of documents, which are relevant to a given domain (topic). For this purpose, we implement Support Vector Machine and classify documents from Reuters-21578 collection. Results of our experiments demonstrate the superiority of our model.
In this paper, we first introduce a new textrepresentation method to convert a textual document into a tensor space model named textCuboid, which can preserve various meanings of polysemy. Based upon the new model, w...
详细信息
text clustering is an important method for effectively organising, summarising, and navigating text information. However, in the absence of labels, the text data to be clustered cannot be used to train the text repres...
详细信息
text clustering is an important method for effectively organising, summarising, and navigating text information. However, in the absence of labels, the text data to be clustered cannot be used to train the text representation model based on deep learning. To address the problem, an algorithm of text clustering based on deep representation learning is proposed using the transfer learning domain adaptation and the parameters update during cluster iteration. First, source domain data is used to perform the pre-training of the deep learning classification model. This procedure acts as an initialisation of the model parameters. Then, the domain discriminator is added to the model, to domain-divide the input sample. If the discriminator cannot distinguish which domain the data belongs to, the common feature space of two domains is obtained, so the domain adaptation problem is solved. Finally, the text feature vectors obtained by the model are clustered with MCSKM++ algorithm. The algorithm not only resolves the model pre-training problem in unsupervised clustering, but also has a good clustering effect on the transfer problem caused by different numbers of domain labels. Experiments suggest that the clustering accuracy of the algorithm is superior to other similar algorithms.
Concerning the system of hot topics detection about the emergency events, an overall technical framework is established to implement the system. Description and solution strategy about the key issues in the four compo...
详细信息
ISBN:
(纸本)9781849194716
Concerning the system of hot topics detection about the emergency events, an overall technical framework is established to implement the system. Description and solution strategy about the key issues in the four components of the system are provided. In terms of the content and structure features of the news reports as well as the distribution feature of the report sources, the text clipping method and the modified model of feature weighting calculation are proposed based on the VSM text representation model and the TF-IDF formula. The news reports about the earthquake emergency event are evaluated for this model as the data sources. Experiment results indicate that the information such as the headline, the lead and relevant feature parameters by clipping the main body of the news report can be considered as the sample set of the hot topics to be identified. Furthermore, compared with the classical model, the modified feature items weighting calculation model is more efficient in execution and more adaptive in terms of the textrepresentation capability.
Concerning the system of hot topics detection about the emergency events, an overall technical framework is established to implement the system. Description and solution strategy about the key issues in the four compo...
详细信息
ISBN:
(纸本)9781622761234
Concerning the system of hot topics detection about the emergency events, an overall technical framework is established to implement the system. Description and solution strategy about the key issues in the four components of the system are provided. In terms of the content and structure features of the news reports as well as the distribution feature of the report sources, the text clipping method and the modified model of feature weighting calculation are proposed based on the VSM text representation model and the TF-IDF formula. The news reports about the earthquake emergency event are evaluated for this model as the data sources. Experiment results indicate that the information such as the headline, the lead and relevant feature parameters by clipping the main body of the news report can be considered as the sample set of the hot topics to be identified. Furthermore, compared with the classical model, the modified feature items weighting calculation model is more efficient in execution and more adaptive in terms of the textrepresentation capability.
暂无评论