Classifying Chinese text is a hot but difficult topic in natural language processing field. In this work, we aim to quickly recommend the required documents for users via searching keywords in a massive literature dat...
详细信息
Classifying Chinese text is a hot but difficult topic in natural language processing field. In this work, we aim to quickly recommend the required documents for users via searching keywords in a massive literature database. We use "Wanfang data Knowledge Service Platform Journal Document User Behavior Log data" as the object to research and build a text classifier based on deep learning techniques. It is proposed to perform data cleaning and data extraction for user browsing behavior logs and user download behavior logs. Firstly, Chinese word segmentation and keyword extraction are carried out on the document title information, and the keyword data set and document keyword data set based on the document name are constructed. Finally, using the traditional deep learning model and convolutional neural network model to build a text classifier for training and classification, as a model for users to search for keywords recommended for literature. Experimental results show that the constructed model is able to effectively classify and recommend documents with the users' search keywords, and extract keywords from document names in the construction model. The proposed model TEXTCNN V2 obtains a value for accuracy, precision and F-1 score is 0.9132, 0.9220 and 0.9154, respectively.
暂无评论