版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Florida Int Univ Sch Comp & Informat Sci Miami FL 33199 USA Arizona State Univ Sch Comp Informat & Decis Syst Engn Tempe AZ 85281 USA NYU Dept Informat Operat & Management Sci New York NY 10012 USA
出 版 物:《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 (IEEE Trans Knowl Data Eng)
年 卷 期:2011年第23卷第10期
页 面:1555-1568页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:US National Science Foundation (NSF) [IIS-0811922, IIS-0952347, IIS-0643846] DHS [2009-ST-062-000016]
主 题:Hidden-web databases keyword search top-k ranking
摘 要:Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, PubMed allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an information retrieval (IR) ranking function. A naive approach would be to submit a disjunctive query with all query keywords, retrieve all the returned matching documents, and then rerank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper, we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to a relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the PubMed database and a TREC data set show that we achieve order of magnitude improvement compared to the current baseline approaches.