The combination of data, semantics, and the Web has led to an ever growing and increasingly complex body of semantic data. Accessing such structured data requires learning formal query languages, such as SPARQL, which...
详细信息
The combination of data, semantics, and the Web has led to an ever growing and increasingly complex body of semantic data. Accessing such structured data requires learning formal query languages, such as SPARQL, which poses significant difficulties for non-expert users. To date, many interfaces for querying Ontologies have been developed. However, such interfaces rely on predefined templates or require expensive pre-processing and customization. Natural Language (NL) interfaces are particularly preferable to other interfaces for providing users with access to data. However the inherent difficulty in mapping NL queries to semantic data can create ambiguities during queryformulation phase. To avoid the pitfalls of existing approaches, while at the same time retaining the ability to capture users' complex information needs, we propose a simple keyword-based search interface to the Semantic Web. Specifically, we propose automatic SPARQL queryformulation (ASQFor), a systematic framework to issue semantic queries over RDF repositories using simple concept-based search primitives. ASQFor has a very simple interface, requires no user training, and can be easily embedded in any system or used with any semantic repository without prior customization. We demonstrate via extensive experimentation that ASQFor significantly speeds up queryformulation while at the same time matching the syntax of hand-crafted queries.
Information technology boosts the development of database retrieval in the Chinese digital humanities domain. However, most database providers adopt a system-oriented design pattern, which fails to handle the problem ...
详细信息
Information technology boosts the development of database retrieval in the Chinese digital humanities domain. However, most database providers adopt a system-oriented design pattern, which fails to handle the problem of query gaps in users' retrieval process. This issue seriously hinders the effective use of database retrieval functionalities, peculiarly among those historical and humanities researchers. To address it, we propose UFTDRDH, a novel user-oriented solution based on automatic query formulation (AQF) technologies, which integrates a human-machine interactive module for the selection of new query-related expansion terms and a powerful query expansion algorithmic component (UFTDRDH-QEV) optimised by a topic-enhancing relevance feedback model approach (ToQE). To verify the effectiveness of UFTDRDH, several comparative experiments are conducted, including quantitative evaluation for retrieval efficiency and user satisfaction, as well as qualitative studies for interpretative traceability. The empirical results are multidimensional and robust, which not only shows the positive effects of different AQFs on gap reduction, especially the importance of query expansion as the most effective technology, but also underlines the remarkably advantageous performance of UFTDRDH compared with traditional system-oriented automaticquery expansion in different task contexts. We believe the application of UFTDRDH can further strengthen the research focus on user-centred design and improve the level of current full-text database retrieval in the field of Chinese digital humanities. Broadly speaking, this solution can be also extended to the full-text database retrieval in other languages and digital humanities domains.
Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate mul...
详细信息
Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.
We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by the index and the retrieval model of a reference search engine: keyqueries for a document are t...
详细信息
ISBN:
(纸本)9781450320344
We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by the index and the retrieval model of a reference search engine: keyqueries for a document are the minimal queries that return the document in the top result ranks. Besides applications in the fields of information retrieval and data mining, keyqueries have the potential to form the basis of a dynamic classification system for future digital libraries-the modern version of keywords for content description. To determine the keyqueries for a document, we present an exhaustive search algorithm along with effective pruning strategies. For applications where a small number of diverse keyqueries is sufficient, two tailored search strategies are proposed. Our experiments emphasize the role of the reference search engine and show the potential of keyqueries as innovative document descriptors for large, fast evolving bodies of digital content such as the web.
This paper outlines our participation in CLEF-IP's 2009 prior art search task. In the task's initial year our focus lay on the automatic generation of effective queries. To this aim we conducted a preliminary ...
详细信息
ISBN:
(纸本)9783642157530
This paper outlines our participation in CLEF-IP's 2009 prior art search task. In the task's initial year our focus lay on the automatic generation of effective queries. To this aim we conducted a preliminary analysis of the distribution of terms common to topics and their relevant documents, with respect to term frequency and document frequency. Based on the results of this analysis we applied two methods to extract queries. Finally we tested the effectiveness of the generated queries on two state of the art retrieval models.
暂无评论