First Story detection describes the task of identifying new events in a stream of documents. The UMass-FSD system is known for its strong performance in First Story detection competitions. Recently, it has been freque...
详细信息
ISBN:
(纸本)9781450380164
First Story detection describes the task of identifying new events in a stream of documents. The UMass-FSD system is known for its strong performance in First Story detection competitions. Recently, it has been frequently used as a high accuracy baseline in research publications. We are the first to discover that UMass-FSD inadvertently leverages temporal bias. Interestingly, the discovered bias contrasts previously known biases and performs significantly better. Our analysis reveals an increased contribution of temporally distant documents, resulting from an unusual way of handling incremental term statistics. We show that this form of temporal bias is also applicable to other well-known First Story detection systems, where it improves the detection accuracy. To provide a more generalizable conclusion and demonstrate that the observed bias is not only an artefact of a particular implementation, we present a model that intentionally leverages a bias on temporal distance. Our model significantly improves the detection effectiveness of state-of-the-art First Story detection systems.
Network arousing more and more attention at present, it is important to grasp the hot topic for news editing and public option surveying. For regional BBS forums, the hot topic is issued in the way of forum posts by i...
详细信息
ISBN:
(纸本)9781467393935
Network arousing more and more attention at present, it is important to grasp the hot topic for news editing and public option surveying. For regional BBS forums, the hot topic is issued in the way of forum posts by individuals, instead of by official media. So the information is various, and the hot topic is hard to be found out. This paper introduced a regional BBS-oriented hot topic a tracking system based on the page views and replies. By processing the page views and replies extracted from the web pages collected and combining with the cluster analysis, the system could get rid of the interference effectively, and detects the hot topic correctly.
Analyzing unstructured text streams can be challenging. One popular approach is to isolate specific themes in the text, and to visualize the connections between them. Some existing systems, like ThemeRiver, provide a ...
详细信息
ISBN:
(纸本)9781424429356
Analyzing unstructured text streams can be challenging. One popular approach is to isolate specific themes in the text, and to visualize the connections between them. Some existing systems, like ThemeRiver, provide a temporal view of changes in themes;other systems, like In-Spire, use clustering techniques to help an analyst identify the themes at a single point in time. Narratives combines both of these techniques;it uses a temporal axis to visualize ways that concepts have changed over time, and introduces several methods to explore how those concepts relate to each other. Narratives is designed to help the user place news stories in their historical and social context by understanding how the major topics associated with them have changed over time. Users can relate articles through time by examining the topical keywords that summarize a specific news event. By tracking the attention to a news article in the form of references in social media (such as weblogs), a user discovers both important events and measures the social relevance of these stories.
Traditional method of Event detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples o...
详细信息
ISBN:
(纸本)9783642052491
Traditional method of Event detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper doesn't classify event using word as samples, but cluster event in judging event types. It adopts self-similarity to convergence the value of K in K-means algorithm by the guidance of event triggers, and optimizes clustering algorithm. Then, combining with named entity and its comparative position information, the new method further make sure the pinpoint type of event. The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.
The Computers in Biology and Medicine (CBM) journal promotes the use of computing machinery in the fields of bioscience and medicine. Since the first volume in 1970, the importance of computers in these fields has gro...
详细信息
With the rapid development of the Internet, social networking platforms and mobile technologies, people are increasingly in contact with the Internet and share their views with others on the Internet. The content that...
详细信息
ISBN:
(纸本)9781538632215
With the rapid development of the Internet, social networking platforms and mobile technologies, people are increasingly in contact with the Internet and share their views with others on the Internet. The content that people care about and talk about every day is a hot topic, hot topics can play an important role in politics, economy, culture and other fields, so the research on hot topics has a great significance. This paper will analyze the hotspot topic discovery algorithm based on Web mining technology, and the basic principles, related technologies and their application areas of these algorithms.
People all over the world talk as events unfold, but what does it take for a machine to truly track an event? Event TimeLine detection (ELD) is a real-time topic detection and tracking (TDT) solution to track events u...
详细信息
ISBN:
(纸本)9781450368858
People all over the world talk as events unfold, but what does it take for a machine to truly track an event? Event TimeLine detection (ELD) is a real-time topic detection and tracking (TDT) solution to track events using Twitter with the hypothesis that it takes a deeper understanding of the event's domain for a machine to describe its evolution. In ELD, understanding takes the form of identifying the participants that would eventually drive the event's evolution. We propose Automatic Participant detection (APD) as a way of identifying event participants, which ELD then tracks during the proceedings. TDT then mines the resulting Twitter stream, extracting developments and describing them as a timeline using a summarization algorithm.
We present a data-driven study on which sources were the first to report on news events. For this, we implemented a news-aggregator that included a large number of established news sources and covered one year of data...
详细信息
ISBN:
(纸本)9781450320382
We present a data-driven study on which sources were the first to report on news events. For this, we implemented a news-aggregator that included a large number of established news sources and covered one year of data. We present a novel framework that is able to retrieve a large number of events and not only the most salient ones, while at the same time making sure that they are not exclusively of local impact. Our analysis then focuses on different aspects of the news cycle. In particular we analyze which are the sources to break most of the news. By looking when certain events become bursty, we are able to perform a finer analysis on those events and the associated sources that dominate the global news-attention. Finally we study the time it takes news outlet to report on these events and how this reflects different strategies of which news to report. A general finding of our study is that big news agencies remain an important threshold to cross to bring global attention to particular news, but it also shows the importance of focused (by region or topic) outlets.
topictracking is an important task of topic detection and tracking (TDT). Its purpose is to detect stories, from a stream of news, related to known topics. Each topic is "known" by its association with seve...
详细信息
ISBN:
(纸本)9783540686330
topictracking is an important task of topic detection and tracking (TDT). Its purpose is to detect stories, from a stream of news, related to known topics. Each topic is "known" by its association with several sample stories that discuss it. In this paper, we propose a new method to build the keywords dependency profile (KDP) of each story and track topic basing on similarity between the profiles of topic and story. In this method, keywords of a story are selected by document summarization technology. The KDP is built by keywords co-occurrence frequency in the same sentences of the story. We demonstrate this profile can describe the core events in a story accurately. Experiments on the mandarin resource of TDT4 and TDT5 show topictracking system basing on KDP improves the performance by 13.25% on training dataset and 7.49% on testing dataset comparing to baseline.
We describe HMNews (Hyper-Media News), a system designed and implemented for the collection, indexing and retrieval of hypermedia news content coming from Digital Television and the Web. The novelty of the approach re...
详细信息
ISBN:
(纸本)9781605584867
We describe HMNews (Hyper-Media News), a system designed and implemented for the collection, indexing and retrieval of hypermedia news content coming from Digital Television and the Web. The novelty of the approach relies in the ability of providing hierarchical and multi-resolution multimodal indexes based on the application of a novel generalised hybrid clustering technique. The system supports many functionalities: a) bi-directional news conceptual linking;b) relevant topics detection and tracking;c) integrated hypermedia browsing;d) integrated search and retrieval.
暂无评论