With the development of Internet, Data information is growing at an explosive rate. With the era of big data coming, information social value can only be reflected by people's utilization. In the vast amounts of d...
详细信息
ISBN:
(纸本)9781538619377
With the development of Internet, Data information is growing at an explosive rate. With the era of big data coming, information social value can only be reflected by people's utilization. In the vast amounts of data, keywords as relatively concise summary of the documentation, its can provide efficient information management methods. Keyword extraction technology (KET) can help people get the data information accurately and quickly, so KET is widely used in the information management system. According to the study of keyword extraction method recent years, the classic TF - IDF algorithm and textrank algorithm were studied in this paper, textrank algorithm improved and innovated based on the idea of TF-IDF algorithm, the process of textrank improved algorithms designed and experiments proved the accuracy of the keyword extraction of the improved textrank algorithm.
Today, with the rapid increase in the use of the internet, thousands of resources can be reached about an information that is interested. However, it is difficult and time consuming to determine which of these sources...
详细信息
ISBN:
(纸本)9781509064946
Today, with the rapid increase in the use of the internet, thousands of resources can be reached about an information that is interested. However, it is difficult and time consuming to determine which of these sources is useful. Automatic document summarization is a dimension reduction process which remains the important parts of the text. In this study, the textrank algorithm, which is a graph based summarization approach, is used with 4 different similarity methods. The effect of these methods on the automatically generated summaries is examined. Among the similarity methods, Levenhesiten method was more successful than others with 0,506 Rouge-1 score.
We propose a feature word selection method for classifying recommended shops using Yelp customer reviews. textrank keywords are extracted from the customer reviews to construct the sorted positive and negative keyword...
详细信息
ISBN:
(纸本)9781509013258
We propose a feature word selection method for classifying recommended shops using Yelp customer reviews. textrank keywords are extracted from the customer reviews to construct the sorted positive and negative keyword lists based on each keyword's summed textrank scores. The top-K keywords are then aggregated iteratively by multiples of K to construct the positive and negative keyword frequency lists. The negative keyword frequency list is then subtracted from the positive keyword frequency list, and the resulting list is standardized to generate the final positive and negative keyword lists. The performance of our feature selection method is evaluated using Naive Bayes classifiers, and the binary classification accuracy of the selected feature words is 77.94%, which is better than the baseline chi(2) feature word selection.
We propose an unsupervised model to extract two types of summaries (positive, and negative) per document based on sentiment polarity. Our model builds a weighted polar digraph from the text, then evolves recursively u...
详细信息
ISBN:
(纸本)9783319181172;9783319181165
We propose an unsupervised model to extract two types of summaries (positive, and negative) per document based on sentiment polarity. Our model builds a weighted polar digraph from the text, then evolves recursively until some desired properties converge. It can be seen as an enhanced variant of textrank type algorithms working with non-polar text graphs. Each positive, negative, and objective opinion has some impact on the other if they are semantically related or placed close in the document. Our experiments cover several interesting scenarios. In case of a one author news article, we notice a significant overlap between the anti-summary (focusing on negatively polarized sentences) and the the summary. For a transcript of a debate or a talk-show, an anti-summary represents the disagreement of the participants on stated topic(s) whereas the summary becomes the collection of positive feedbacks. In this case, the anti-summary tends to be disjoint from the regular summary. Overall, our experiments show that our model can be used with textrank to enhance the quality of the extractive summarization process.
暂无评论