Deep learning model has witnessed its obvious advantage in feature representation and document retrival. However, the model only considered most frequent words as the input to learn latent features, which inevitably i...
详细信息
ISBN:
(纸本)9789811026720;9789811026713
Deep learning model has witnessed its obvious advantage in feature representation and document retrival. However, the model only considered most frequent words as the input to learn latent features, which inevitably ignores lots of useful information contained in documents especially for high-dimensional documents. We introduce a novel method based on word-vector clustering to obtain low-dimensional semantic vectors of documents, as the input of deep learning model to improve the feature representation in the output layer. Firstly, word-vector, a kind of compact and distributed representation of words, is obtained by training neural network language model using word2vec. Then, we present a modified word-vector clustering method based on locality-sensitive hashing and affinity propagation, with a stronger adaptability and scalability for large scale and high dimensionality. Afterwards, each document is represented by the set of cluster centers as the input of deep learning model. Experimental results proved the proposed method improves the ability of feature representation of deep learning model and performs better on document retrieval task compared with traditional methods.
暂无评论