shorttextclassificationmethods have achieved significant progress and wide application on text data such as Twitter and Weibo. However, the extremelyshortchinesetexts like tax invoice data are different with tra...
详细信息
shorttextclassificationmethods have achieved significant progress and wide application on text data such as Twitter and Weibo. However, the extremelyshortchinesetexts like tax invoice data are different with traditional shorttexts in lackness of contextual semantic information, feature sparseness and extremelyshort length. The existing shorttextclassificationmethods are difficult to achieve a satisfactory performance in these texts. To address these problems, this paper proposes a textclassificationmethodbased on bidirectionalsemanticextension for extremelyshorttexts like chinese tax invoice data. More specifically, firstly, the chinese knowledge graph is introduced for extending bidirectionalsemantic of texts and label data to expand the extremelyshorttexts and ease the problem of feature sparseness;secondly,the hash vectorization is used to avoid the semantic problem caused by the lackness of contextual information. Experimental results conducted the real tax invoice dataset demonstrate the effectiveness of our proposed method.
暂无评论