Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research li...
详细信息
Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research literatures, the extraction summary method is widely used and is also one type of the main research methods of summary methods. However, this extraction summary method still has some problems. The selection of the initial cluster center has not been carefully determined, and the sentence redundancy summarized is high in articles with complex sentences. In order to solve the above problems, this paper proposes an automatic text summarization method based on improved textrank algorithm and K -Means clustering. This method combines the improved BM25 model and the textrank algorithm to calculate the BM25 similarity between sentences and obtain the TR scores of sentences. The TR scores are used to select the initial center of clustering based on similarity difference judgment and maximum judgment. The final summary is obtained by combining the cluster scores and sentence scores. The experimental results show that the proposed method in this paper has better evaluation indicators containing ROUGE -1, ROUGE -2 and ROUGE -L than other comparison algorithms including Lead -3, textrank and MBM25EMB on the DUC2004 dataset. In conclusion, the proposed method in this paper improves the accuracy of automatic text summarization and reduce the redundancy from documents.
Reading large and lengthy documents is a tedious and time-consuming task. A summary of the same document gives us an overall idea of what the document is all about. Automated summaries can be generated using various a...
详细信息
Selecting a scientific, practical and efficient energy transition path is the key to solving the main contradiction in the energy industry. Considering text mining and probabilistic linguistic information complementat...
详细信息
Selecting a scientific, practical and efficient energy transition path is the key to solving the main contradiction in the energy industry. Considering text mining and probabilistic linguistic information complementation, a largescale group decision-making method is developed. Firstly, text mining technology is used to extract big data of public behavioral preference, so as to establish the evaluation criteria system of energy transition paths, and a criterion weighting model is proposed according to affinity coefficient and textrank algorithm. Then, experts are clustered based on the social trust network analysis, so that the missing probabilistic linguistic information of expert are completed. Next, the clusters with overlapping features or isolated nodes are optimized via the principle of minimum deviation, and the expert weights are modified by combining information similarity. Finally, the alternatives are ranked by the S-hyperbolic absolute risk aversion utility function. The proposed method is applied to the practical problem of evaluating China's energy transition paths under the dual-carbon goal, with "enhancing the use of clean energy" as the optimal path. The validity and practicability of the model is demonstrated through the multidimensional sensitivity analysis, and insights and suggestions are given in this field.
One of the most popular south Indian languages in India is the Telugu language which is currently spoken by 84 million native Telugu speakers in Andhra Pradesh and Telangana. With the rapid growth of the Telugu digita...
详细信息
One of the most popular south Indian languages in India is the Telugu language which is currently spoken by 84 million native Telugu speakers in Andhra Pradesh and Telangana. With the rapid growth of the Telugu digital content, the need for the automatic text summarizer is arisen to provide short text from huge text documents. Extractive text summarization model generates only significant sentences. Abstractive text summarization method requires more training time. In this paper, a novel hybrid model is proposed for generating text summaries by combining extractive and abstractive approach to reduce the training time. For extractive method textrank algorithm is utilized and for abstractive method attention-based sequence to sequence model with bidirectional long short-term memory (Bi-LSTM) is utilized. Moreover, coverage mechanism is included into the proposed hybrid approach to reduce the repetition in summaries and to improve the quality of summaries. The performance of the proposed hybrid model is evaluated by the ROUGE toolkit in terms of F-measure, recall and precision. The results of the proposed model are compared with other existing models which shows that the proposed hybrid model outperforms other existing text summarization models for Telugu Language.
Ontologies play a vital role in organizing and constructing knowledge across various domains, enabling effective knowledge management and sharing. The development of domain-specific ontologies, such as the ONTO-TDM on...
详细信息
Ontologies play a vital role in organizing and constructing knowledge across various domains, enabling effective knowledge management and sharing. The development of domain-specific ontologies, such as the ONTO-TDM ontology for teaching domain modeling, is essential for providing a comprehensive and standardized representation of knowledge within a given discipline. However, to maximize the usefulness and relevance of such ontologies, it is crucial to automate their population with domain-specific information, reducing manual work and ensuring scalability. This paper presents a novel method for ontology population by extracting and integrating relevant information from diverse sources. The method combines the textrank algorithm with Word2Vec to enhance keyword extraction, capturing both semantic meaning and textual importance. Keywords are then annotated and used to train a machine learning classifier, which aids in integrating new instances into the ontology. Experiments show that the proposed method achieves a precision of 63.33%, a recall of 61.29% and an F1-score of 62.28%, significantly improving keyword extraction and ontology population accuracy compared to existing methods. This validates the method's effectiveness in semi-automatically extracting relevant instances from diverse data sources, enhancing the efficiency and accuracy of ontology population, and advancing automated knowledge management in domain-specific contexts.
This research explores the theoretical and practical aspects of two fundamental tasks in Natural Language Processing: keyword extraction and extractive summarization, with a focus on the Romanian language. The study i...
详细信息
This research explores the theoretical and practical aspects of two fundamental tasks in Natural Language Processing: keyword extraction and extractive summarization, with a focus on the Romanian language. The study investigates the textrank algorithm's application for identifying key terms and generating extractive summaries from texts in Romanian. The investigation reveals the algorithm's language independence, with minimal preprocessing requirements. The findings underscore the significance of automated text processing tools in enhancing information retrieval and document organization in Romanian. This study contributes to advancing Natural Language Processing methodologies and tools for Romanian language applications.
In the process of keyword extraction, news text has its uniqueness. Keywords extraction of news text not only needs to pay attention to the difference of quantitative indexes of words, but also needs to consider the i...
详细信息
ISBN:
(数字)9781510651890
ISBN:
(纸本)9781510651890;9781510651883
In the process of keyword extraction, news text has its uniqueness. Keywords extraction of news text not only needs to pay attention to the difference of quantitative indexes of words, but also needs to consider the influence of phrases. In order to improve the keyword extraction effect of news texts, this paper constructs a keyword graph based on textrank, improves the probability transition matrix by combining four quantitative indicators of node frequency, location, span and part of speech, realizing the weight difference of words. Considering the influence of word segmentation technology on phrases extraction, the reconstruction of phrases is completed according to the law of recombination and the concept of combinatorial entropy is defined to realize the filtering of reconstructed phrases. According to the statistical quantitative index of phrases, the linear weighted value is assigned to the reconstructed phrases, and finally, the TopN words or phrases are selected as keywords according to their weight value. Experimental results show that the proposed algorithm is not only superior to the traditional textrank and TF-IDF algorithms, but also has great advantages compared with the improved PositionRank and MyWPMWRank algorithms, the F value of which can be increased by 9.75% at most, which effectively improves the keywords extraction effect of news text.
News Aggregator is simply an online software which collects new stories and events around the world from various sources all in one place. News aggregator plays a very important role in reducing time consumption, as a...
详细信息
News Aggregator is simply an online software which collects new stories and events around the world from various sources all in one place. News aggregator plays a very important role in reducing time consumption, as all of the news that would be explored through more than one website will be placed only in a single location. Also, summarizing this aggregated content absolutely will save reader's time. A proposed technique used called the textrank algorithm that showed promising results for summarization. This paper presents the main goal of this project which is developing a news aggregator able to aggregate relevant articles of a certain input keyword or key-phrase. Summarizing the relevant articles after enhancing the text to give the reader understandable and efficient summary.
Keyword extraction is an important content in many fields and it is a key step to achieve document retrieval, information retrieval, scientific and technological literature indexing, news reading, text clustering and ...
详细信息
ISBN:
(纸本)9781538661369
Keyword extraction is an important content in many fields and it is a key step to achieve document retrieval, information retrieval, scientific and technological literature indexing, news reading, text clustering and classification, Machine Translation and so on. In order to improved the accuracy of keyword extraction for text, we put forward a framework of keyword extraction based on meta-learning. This framework not only integrates the keyword extraction algorithm selection, parameter adjustment algorithm, but also integrates a variety of algorithms. Experimental results show that keyword extraction based on meta learning is not only simple, but also significantly improved the accuracy of keyword extraction.
In this study,the data mining crawler technology was used to obtain the information of top 100 liquor list and more than 13 thousand related post-purchase review,liquor products was divided into six grades according t...
详细信息
In this study,the data mining crawler technology was used to obtain the information of top 100 liquor list and more than 13 thousand related post-purchase review,liquor products was divided into six grades according to the *** textrank algorithm to work out the key words of reviews and their weights of each grade's products,then combined with the characteristics of online shopping consumption to establish an evaluation indicators of net purchase of liquor *** paper classifies the key words of each grade of liquor products by the indicators established before,thus acquiring the evaluation system of six grades'liquor by the *** on the results,the paper explores the differences in the concerns of consumers buying different liquor products at different prices and puts forward the corresponding marketing management suggestions.
暂无评论