the proceedings contain 81 papers. the topics discussed include: spatially-aware indexing for image object retrieval;Auralist: introducing serendipity into music recommendation;beyond co-occurrence: discovering and vi...
ISBN:
(纸本)9781450307475
the proceedings contain 81 papers. the topics discussed include: spatially-aware indexing for image object retrieval;Auralist: introducing serendipity into music recommendation;beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial and temporal similarities;overcoming browser cookie churn with clustering;of hammers and nails: an empirical comparison of three paradigms for processing large graphs;mining slang and urban opinion words and phrases from cQA services: an optimization approach;characterizing web content, user interests, and search behavior by reading level and topic;selecting actions for resource-bounded information extraction using reinforcement learning;overlapping clusters for distributed computation;personalized click model through collaborative filtering;extracting search-focused key N-grams for relevance ranking in websearch;and tapping into knowledge base for concept feedback: leveraging ConceptNet to improve search results for difficult queries.
web-based learning is evolving rapidly as traditional search engines are complemented by Large Language Models (LLMs) and other AI technologies. this evolution offers new opportunities, such as automated information s...
详细信息
WSCD2012 is the second workshop on websearch Click data, following WSCD2009. It is a forum for new research relating to websearch usage logs and for discussing desirable properties of publicly released search log da...
详细信息
ISBN:
(纸本)9781450307475
WSCD2012 is the second workshop on websearch Click data, following WSCD2009. It is a forum for new research relating to websearch usage logs and for discussing desirable properties of publicly released search log datasets. Research relating to search logs has been hampered by the limited availability of click datasets. this workshop comes with a new click dataset based on click logs and an accompanying challenge to predict the relevance of documents based on clicks.
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important...
详细信息
ISBN:
(纸本)9798400703713
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. Recently, machine learning techniques are widely developed and utilized to effectively tame graphs for discovering actionable patterns and harnessing them for advancing various graph-related computational tasks. Huge success has been achieved and numerous real-world applications have benefited from it. However, since in today's world, we are generating and gathering data in a much faster and more diverse way, real-world graphs are becoming increasingly large-scale and complex. More dedicated efforts are needed to propose more advanced machine learning techniques and properly deploy them for real-world applications in a scalable way. thus, we organize the 5thinternational Workshop on Machine Learning on Graphs (MLoG)(1), held in conjunction withthe 17thacmconference on websearch and datamining (wsdm), which provides a venue to gather academia researchers and industry researchers/practitioners to present the recent progress on machine learning on graphs.
Computational advertising refers to finding the most relevant ads matching a particular context on the web. the core problem attacked in computational advertising (CA) is of the match making between the ads and the co...
详细信息
ISBN:
(纸本)9781450307475
Computational advertising refers to finding the most relevant ads matching a particular context on the web. the core problem attacked in computational advertising (CA) is of the match making between the ads and the context. My research work aims at leveraging various user interaction, ad and advertiser related information and contextual information for improving the relevance, ranking and targeting of ads. the research work focuses on the identification of various factors that contribute in retrieving and ranking the most relevant set of ads that match best withthe context. Specifically, information associated withthe user, publisher and advertiser is leveraged for this purpose.
In sponsored search auctions (SSA) advertisers compete for ad slots in the search engine results page, by bidding on keywords of interest. To improve advertiser expressiveness, we augment the bidding process with conf...
详细信息
ISBN:
(纸本)9781450307475
In sponsored search auctions (SSA) advertisers compete for ad slots in the search engine results page, by bidding on keywords of interest. To improve advertiser expressiveness, we augment the bidding process with conflict constraints. With such constraints, advertisers can condition their bids on the non-appearance of certain undesired ads on the results page. We study the complexity of the allocation problem in these augmented SSA and we introduce an algorithm that can efficiently allocate the ad slots to advertisers. We evaluate the algorithm run time in simulated conflict scenarios and we study the implications of the conflict constraints on search engine revenue. Our results show that the allocation problem can be solved within few tens of milliseconds and that the adoption of conflict constraints can potentially increase search engine revenue. Copyright 2012acm.
When an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the quer...
详细信息
ISBN:
(纸本)9781450307475
When an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as participants in the TREC web and Blog track diversity tasks, or the NTCIR INTENT task), as well as to search engines professionals. In the 2nd edition of the Diversity in Document Retrieval workshop (DDR 2012), we solicited submissions both on approaches and models for diversity, the evaluation of diverse search results, and on applications of diverse search results. this workshop builds upon a successful 1st edition of DDR which was held at ECIR 2011 in Dublin, Ireland [6].
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms a...
详细信息
ISBN:
(纸本)9781450307475
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and concept-instance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. the method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs. Copyright 2012acm.
the accumulation of large collections of social media data poses new challenges for the design of exploratory experiences, such as when a user browses through a collection to discover content (e.g. exploring photo col...
详细信息
ISBN:
(纸本)9781450307475
the accumulation of large collections of social media data poses new challenges for the design of exploratory experiences, such as when a user browses through a collection to discover content (e.g. exploring photo collections, network of friends, etc). Cardinality and characteristics of the set, together with volatility of the information, resulting from fast and continuous creation, deletion and updating of entries, trigger novel research questions. In this context, we plan to investigate and contribute to the data analysis, and user interface design of exploratory experiences. the proposed approach is an iterative process where analysis and design phases are performed in cycles. the long-term vision is to understand the underlying reasoning in order to be able to automatically replicate it.
the prevalence of social media applications is generating potentially large personal archives of posts, tweets, and other communications. the existence of these archives creates a need for search tools, which can be s...
详细信息
ISBN:
(纸本)9781450307475
the prevalence of social media applications is generating potentially large personal archives of posts, tweets, and other communications. the existence of these archives creates a need for search tools, which can be seen as an extension of current desktop search services. Little is currently known about the best search techniques for personal archives of social data, because of the difficulty of creating test collections. In this paper, we describe how test collections for personal social data can be created by using games to collect queries. We then compare a range of retrieval models that exploit the semi-structured nature of social data. Our results show that a mixture of language models with field distribution estimation can be effective for this type of data, with certain fields, such as the name of the poster, being particularly important. We also analyze the properties of the queries that were generated by users with two versions of the games. Copyright 2012acm.
暂无评论