A key requirement for high-performing question-answering (QA) systems is access to high-quality reference corpora from which answers to questions can be hypothesized and evaluated. However, the topic of source acquisi...
详细信息
A key requirement for high-performing question-answering (QA) systems is access to high-quality reference corpora from which answers to questions can be hypothesized and evaluated. However, the topic of source acquisition and engineering has received very little attention so far. This is because most existing systems were developed under organized evaluation efforts that included reference corpora as part of the task specification. The task of answering Jeopardy!(TM) questions, on the other hand, does not come with such a well-circumscribed set of relevant resources. Therefore, it became part of the IBM Watson (TM) effort to develop a set of well-defined procedures to acquire high-quality resources that can effectively support a high-performing QA system. To this end, we developed three procedures, i.e., source acquisition, source transformation, and source expansion. Source acquisition is an iterative development process of acquiring new collections to cover salient topics deemed to be gaps in existing resources based on principled error analysis. Source transformation refers to the process in which information is extracted from existing sources, either as a whole or in part, and is represented in a form that the system can most easily use. Finally, source expansion attempts to increase the coverage in the content of each known topic by adding new information as well as lexical and syntactic variations of existing information extracted from external large collections. In this paper, we discuss the methodology that we developed for IBM Watson for performing acquisition, transformation, and expansion of textual resources. We demonstrate the effectiveness of each technique through its impact on candidate recall and on end-to-end QA performance.
Purpose - Designing efficient XML schemas is essential for XML applications which manage semi-structured data. On generating XML schemas, there are two opposite goals: to avoid redundancy and to provide connected stru...
详细信息
Purpose - Designing efficient XML schemas is essential for XML applications which manage semi-structured data. On generating XML schemas, there are two opposite goals: to avoid redundancy and to provide connected structures in order to achieve good performance on queries. In general, highly connected XML structures allow data redundancy, and redundancy-free schemas generate disconnected XML structures. The purpose of this paper is to describe and evaluate by experiments an approach which balances such trade-off through a workload analysis. Additionally, it aims to identify the most accessed data based on the workload and suggest indexes to improve access performance. Design/methodology/approach - The paper applies and evaluates a workload-aware methodology to provide indexing and highly connected structures for data which are intensively accessed through paths traversed by the workload. Findings - The paper presents benchmarking results on a set of design approaches for XML schemas and demonstrates that the XML schemas generated by the approach provide high query performance and low cost of data redundancy on balancing the trade-off on XML schema design. Research limitations/implications - Although an XML benchmark is applied in these experiments, further experiments are expected in a real-world application. Practical implications - The approach proposed may be applied in a real-world process for designing new XML databases as well as in reverse engineering process to improve XML schemas from legacy databases. Originality/value - Unlike related work, the reported approach integrates the two opposite goal in the XML schema design, and generates suitable schemas according to a workload. An experimental evaluation shows that the proposed methodology is promising.
The deep web, the part of the web consisting of web pages filled with information from myriads of online databases, is to date relatively unexplored. Even its basic characteristics such as, for instance, the number of...
详细信息
ISBN:
(纸本)9781450306270
The deep web, the part of the web consisting of web pages filled with information from myriads of online databases, is to date relatively unexplored. Even its basic characteristics such as, for instance, the number of searchable databases on the web are disputable. In this paper, we address the problem of accurate estimation of the deep web by sampling one national web domain. We report some of our results obtained when surveying the Russian web. The survey findings, namely the size estimates of the deep web, could be useful for further studies to handle data in the deep web.
Time flies when you're having fun. This is the right way to describe this WordPress Top Plugins book by Brandon Corbin. With real world examples and by showing you the perks of having these plugins installed on yo...
详细信息
ISBN:
(数字)9781849511414
ISBN:
(纸本)9781849511407
Time flies when you're having fun. This is the right way to describe this WordPress Top Plugins book by Brandon Corbin. With real world examples and by showing you the perks of having these plugins installed on your websites, the author is all set to captivate your interest from start to end. Regardless of whether this is your first time working with WordPress, or you’re a seasoned WordPress coding ninja, WordPress Top Plugins will walk you through finding and installing the best plugins for generating and sharing content, building communities and reader base, and generating real advertising revenue.
Information about individuals on publicly available web sites stands as a valuable, yet unorganized, data source. Turning such an enormous data source into a "database" is highly desirable as it has the pote...
详细信息
ISBN:
(纸本)9781424450213
Information about individuals on publicly available web sites stands as a valuable, yet unorganized, data source. Turning such an enormous data source into a "database" is highly desirable as it has the potential to lead to novel ways of using the available information to the largest extent. In this paper, we present PopulusLog, a novel web data mining system. PopulusLog is a pioneering example of next generation search engines which produces and provides access to non-intuitive knowledge on tire web. It involves a framework for tools that collect, extract, mine, query, browse, and visualize information about anonymous people.
A vast amount of valuable information, produced and consumed by people and institutions, is currently stored in relational databases. For many purposes, there is an ever increasing demand for having these databases pu...
详细信息
A vast amount of valuable information, produced and consumed by people and institutions, is currently stored in relational databases. For many purposes, there is an ever increasing demand for having these databases published on the web, so that users can query the data available in them. An important requirement for this to happen is that query interfaces must be as simple and intuitive as possible. In this paper we present LABRADOR, a system for efficiently publishing relational databases on the web by using a simple text box query interface. The system operates by taking an unstructured keyword-based query posed by a user and automatically deriving an equivalent SQL query that fits the user's information needs, as expressed by the original query. The SQL query is then sent to a DBMS and its results are processed by LABRADOR to create a relevance-based ranking of the answers. Experiments we present show that LABRADOR can automatically find the most suitable SQL query in more than 75% of the cases, and that the overhead introduced by the system in the overall query processing time is almost insignificant. Furthermore, the system operates in a non-intrusive way, since it requires no modifications to the target database schema. (c) 2006 Elsevier Ltd. All rights reserved.
The article discusses a knowledge management system which has been developed for Chinese language arts teachers. Details about the system and the subject of Chinese language arts are provided. The complexity of Chines...
详细信息
The article discusses a knowledge management system which has been developed for Chinese language arts teachers. Details about the system and the subject of Chinese language arts are provided. The complexity of Chinese language arts textbooks are examined and the management program is presented as a way to effectively organize class material. A diagram illustrating the schema of the database program is presented and several images illustrating the usefulness and functionality of the program are also provided.
In this paper, the Deep web technologies are analyzed and discussed, and a middleware of finding and integrating Deep web query interface automatically is proposed. This middleware extracts the attributes of query int...
详细信息
ISBN:
(纸本)9780769534800
In this paper, the Deep web technologies are analyzed and discussed, and a middleware of finding and integrating Deep web query interface automatically is proposed. This middleware extracts the attributes of query interfaces and judges them whether interfaces of web databases by computing the similarity between them;it can also clustering query interfaces and construct an integrated query interface. This middleware provides a practical tool for finding query interface automatically and constructing integrated query interfaces.
暂无评论