search engines provide result summaries to help users quickly identify whether or not it is worthwhile to click on a result and read in detail. However, users may visit non-relevant results and/or skip relevant ones. ...
详细信息
ISBN:
(纸本)9781450337168
search engines provide result summaries to help users quickly identify whether or not it is worthwhile to click on a result and read in detail. However, users may visit non-relevant results and/or skip relevant ones. these actions are usually harmful to the user experience, but few considered this problem in search result ranking. this paper optimizes relevance of results and user click and skip activities at the same time. Comparing two equally relevant results, our approach learns to rank the one that users are more likely to click on at a higher position. Similarly, it demotes non-relevant web pages with high click probabilities. Experimental results show this approach reduces about 10%-20% of the click and skip errors with a trade off of 2.1% decline in nDCG 10.
作者:
Xie, XiaohuiTsinghua Univ
Beijing Natl Res Ctr Informat Sci & Technol Dept Comp Sci & Technol Inst Artificial Intelligence Beijing 100084 Peoples R China
web-based image search engines differ from websearch engines greatly. the intents or goals behind human interactions with image search engines are different. In image search, users mainly search images instead of web...
详细信息
ISBN:
(纸本)9781450359405
web-based image search engines differ from websearch engines greatly. the intents or goals behind human interactions with image search engines are different. In image search, users mainly search images instead of web pages or online services. It is essential to know why people search for images because user satisfaction may vary as intent varies. Furthermore, image search engines show results differently. For example, grid-based placement is used in image search instead of the linear result list, so that users can browse result list both vertically and horizontally. Different user intents and system Ills lead to different user behavior. thus, it is hard to apply standard user behavior models developed for general websearch to image search. To better understand user intent and behavior in image search scenarios, we plan to conduct the lab-based user study, field study and commercial search log analysis. We then propose user behavior models based on the observation from data analysis to improve the performance of web image search engines.
User satisfaction is an important factor when evaluating search systems, and hence a good metric should give rise to scores that have a strong positive correlation with user satisfaction ratings. A metric should also ...
详细信息
ISBN:
(纸本)9781450368223
User satisfaction is an important factor when evaluating search systems, and hence a good metric should give rise to scores that have a strong positive correlation with user satisfaction ratings. A metric should also correspond to a plausible user model, and hence provide a tangible manifestation of how users interact withsearch rankings. Recent work has focused on metrics whose user models accurately portray the behavior of search engine users. Here we investigate whether those same metrics then also correlate with user satisfaction. We carry out experiments using various classes of metrics, and confirm through the lens of the C/W/L framework that the metrics with user models that reflect typical behavior also tend to be the metrics that correlate well with user satisfaction ratings.
We present Community Connect, a custom social media platform for conducting controlled experiments of human behavior. the key distinguishing factor of Community Connect is the ability to control the visibility of user...
详细信息
ISBN:
(纸本)9781450382977
We present Community Connect, a custom social media platform for conducting controlled experiments of human behavior. the key distinguishing factor of Community Connect is the ability to control the visibility of user posts based on the groups they belong to, allowing careful and controlled investigation into how information propagates through a social network. We release this platform as a resource to the broader community, to facilitate research on data collected through controlled experiments on social networks.
We show how to programmatically model processes that humans use when extracting answers to queries (e.g., "Who invented typewriter?", "List of Washington national parks") from semi-structured web p...
详细信息
ISBN:
(纸本)9781450329569
We show how to programmatically model processes that humans use when extracting answers to queries (e.g., "Who invented typewriter?", "List of Washington national parks") from semi-structured web pages returned by a search engine. this modeling enables various applications including automating repetitive search tasks, and helping search engine developers design micro-segments of factoid questions. We describe the design and implementation of a domain-specific language that enables extracting data from a webpage based on its structure, visual layout, and linguistic patterns. We also describe an algorithm to rank multiple answers extracted from multiple webpages. On 100,000+ queries (across 7 micro-segments) obtained from Bing logs, our system LASEweb answered queries with an average recall of 71%. Also, the desired answer(s) were present in top-3 suggestions for 95%+ cases.
For users working on a complex search task, it is common to address different goals at various stages of the task through query iterations. While addressing these goals, users go through different task states as well....
详细信息
ISBN:
(纸本)9781450394079
For users working on a complex search task, it is common to address different goals at various stages of the task through query iterations. While addressing these goals, users go through different task states as well. Understanding these task states latent under users' interactions is crucial in identifying users' changing intents and search behaviors to simulate and achieve real-time adaptive search recommendations and retrievals. However, the availability of sizeable real-world websearch logs is scarce due to various ethical and privacy concerns, thus often challenging to develop generalizable task-aware computation models. Furthermore, session logs with task state labels are rarer. For many researchers who lack the resources to directly and at scale collect data from users and conduct a time-consuming data annotation process, this becomes a considerable bottleneck to furthering their research. Synthetic search sessions have the potential to address this gap. this paper shares a parsimonious model to simulate synthetic websearch sessions with task state information, which interactive information retrieval (IIR) and search personalization studies could utilize to develop and evaluate task-based search and retrieval systems.
One of the main problems that emerges in the classic approach to semantics is the difficulty in acquisition and maintenance of ontologies and semantic annotations. On the other hand, the flow of data and documents whi...
详细信息
ISBN:
(纸本)9780769548807
One of the main problems that emerges in the classic approach to semantics is the difficulty in acquisition and maintenance of ontologies and semantic annotations. On the other hand, the flow of data and documents which are accessible from the web is continuously fueled by the contribution of millions of users who interact digitally in a collaborative way. search engines, continually exploring the web, are therefore the natural source of information on which to base a modern approach to semantic annotation. A promising idea is that it is possible to generalize the semantic similarity, under the assumption that semantically similar terms behave similarly, and define collaborative proximity measures based on the indexing information returned by search engines. In this work PMING, a new collaborative proximity measure based on search engines, which uses the information provided by search engines, is introduced as a basis to extract semantic content. PMING is defined on the basis of the best features of other state-of-the-art proximity distances which have been considered. It defines the degree of relatedness between terms, by using only the number of documents returned as result for a query, then the measure dynamically reflects the collaborative change made on the web resources. Experiments held on popular collaborative and generalist engines (e.g. Flickr, Youtube, Google, Bing, Yahoo search) show that PMING outperforms state-of-the-art proximity measures (e.g. Normalized Google Distance, Flickr Distance etc.), in modeling contexts, modeling human perception, and clustering of semantic associations.
the ubiquitous presence of search engines has revolutionized the way people access information. Google, as the dominant search engine worldwide, plays a pivotal role in shaping information retrieval experiences. It em...
详细信息
ISBN:
(纸本)9798400716348
the ubiquitous presence of search engines has revolutionized the way people access information. Google, as the dominant search engine worldwide, plays a pivotal role in shaping information retrieval experiences. It employs personalized search algorithms to deliver tailored search results based on each user's preferences. Despite numerous studies on general personalization in search engines, there is limited research on geolocation-driven personalization in search engine results, particularly in India. this research paper aims to quantitatively analyze and assess the impact of geolocation on personalized search results within the context of India. To conduct this study, we have selected an extensive set of search queries across various domains. Multiple geolocations within India were chosen to represent different regions, cities, and rural areas. Using a systemic methodology, we collected and analyzed search results for each query, keeping the user's geolocation as a variable. the study focuses on the extent of personalization introduced by Google's search algorithms in search result rankings based on geolocation. the findings indicate that personalization influences search results, though the degree of variation depends on the specific search query category and result ranking. Queries regarding popular or local items show higher personalization, while within-state personalization is more elevated in larger states or cities with cosmopolitan populations. this research paves the way for fostering a deeper understanding of the implications of geolocation-driven search result personalization.
Email has been an essential communication medium for many years. As a result, the information accumulated in our mailboxes has become valuable for all of our personal and professional activities. For years, researcher...
详细信息
ISBN:
(纸本)9781450391320
Email has been an essential communication medium for many years. As a result, the information accumulated in our mailboxes has become valuable for all of our personal and professional activities. For years, researchers have developed interfaces, models, and algorithms to facilitate email search, discovery, and organization. this tutorial brings together these diverse research directions and provides both a historical background, as well as a high-level overview of the recent advances in the field. In particular, we lay out all of the components needed in the design of email search engines, including user interfaces, indexing, document and query understanding, retrieval, ranking, evaluation, and data privacy. the tutorial also goes beyond search, presenting recent work on intelligent task assistance in email and a number of interesting future directions.
WSCD2012 is the second workshop on websearch Click data, following WSCD2009. It is a forum for new research relating to websearch usage logs and for discussing desirable properties of publicly released search log da...
详细信息
ISBN:
(纸本)9781450307475
WSCD2012 is the second workshop on websearch Click data, following WSCD2009. It is a forum for new research relating to websearch usage logs and for discussing desirable properties of publicly released search log datasets. Research relating to search logs has been hampered by the limited availability of click datasets. this workshop comes with a new click dataset based on click logs and an accompanying challenge to predict the relevance of documents based on clicks.
暂无评论