Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important...
详细信息
ISBN:
(纸本)9781450394079
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. Recently, machine learning techniques are widely developed and utilized to effectively tame graphs for discovering actionable patterns and harnessing them for advancing various graph-related computational tasks. Huge success has been achieved and numerous real-world applications have benefited from it. However, since in today's world, we are generating and gathering data in a much faster and more diverse way, real-world graphs are becoming increasingly large-scale and complex. More dedicated efforts are needed to propose more advanced machine learning techniques and properly deploy them for real-world applications in a scalable way. thus, we organize the 3rd international Workshop on Machine Learning on Graphs (MLoG)(1), held in conjunction withthe 16th ACM conference on websearch and datamining (WSDM), which provides a venue to gather academia researchers and industry researchers/practitioners to present the recent progress on machine learning on graphs.
How to accelerate the search for relevant topical keywords within a tweet corpus? Computational social scientists conducting topical studies employ large, self-collected or crowdsourced social media datasets such as t...
详细信息
ISBN:
(纸本)9781450394079
How to accelerate the search for relevant topical keywords within a tweet corpus? Computational social scientists conducting topical studies employ large, self-collected or crowdsourced social media datasets such as tweet corpora. Comprehensive sets of relevant keywords are often necessary to sample or analyze these data sources. However, naively skimming through thousands of keywords can quickly become a daunting task. In this study, we present a web-based application to simplify the search for relevant topical hashtags in a tweet corpus. DISKEYWORD allows users to grasp high-level trends in their dataset, while iteratively labeling keywords recommended based on their links to prior labeled hashtags. We open-source our code under the MIT license.
Academic conferences have been proven to be significant in facilitating academic activities. To promote information retrieval specific to academic conferences, building complete, systematic, and professional conferenc...
详细信息
ISBN:
(纸本)9781450394079
Academic conferences have been proven to be significant in facilitating academic activities. To promote information retrieval specific to academic conferences, building complete, systematic, and professional conference knowledge graphs is a crucial task. However, many related systems mainly focus on general knowledge of overall academic information or concentrate services on specific domains. Aiming at filling this gap, this work demonstrates a novel conference knowledge graph, namely web of conferences. the system accommodates detailed conference profiles, conference ranking lists, intelligent conference queries, and personalized conference recommendations. web of conferences supports detailed conference information retrieval while providing the ranking of conferences based on the most recent data. conference queries in the system can be implemented via precise search or fuzzy search. then, according to users' query conditions, personalized conference recommendations are available. web of conferences is demonstrated with a user-friendly visualization interface and can be served as a useful information retrieval system for researchers.
Recommender systems have achieved great success in our daily life. In recent years, the ethical concerns of AI systems have gained lots of attention. At the same time, graph learning techniques are powerful in modelli...
详细信息
ISBN:
(纸本)9781450394079
Recommender systems have achieved great success in our daily life. In recent years, the ethical concerns of AI systems have gained lots of attention. At the same time, graph learning techniques are powerful in modelling the complex relations among users and items under recommender system applications. these graph learning-based methods are data hungry, which brought a significant data efficiency challenge. In this proposal, I introduce my PhD research from three aspects: 1) Efficient privacy-preserving recommendation for imbalanced data. 2) Efficient recommendation model training for Insufficient samples. 3) Explainability in the social recommendation. Challenges and solutions of the above research problems have been proposed in this proposal.
the clipboard is a central tool in human-computer interaction. It is difficult to imagine a productive day-today interaction with computers, tablets, and smartphones, without copy and paste functionalities. this study...
详细信息
ISBN:
(纸本)9789897584787
the clipboard is a central tool in human-computer interaction. It is difficult to imagine a productive day-today interaction with computers, tablets, and smartphones, without copy and paste functionalities. this study analyzes real usage data from a commercial website in order to understand what types of textual content users copy from the website, for what purposes, and what can we use such user activity data for. this paper advocates treating clipboard copy operations as a bidirectional human-computer dialogue, in which the computer can gain knowledge about the users, their preferences, and their needs. Copy operations data may be useful in various applications. For example, users may copy to the clipboard words that make the text difficult to understand, in order to search for more information on the internet. Accordingly, word copying on a website may be used as an indicator in Complex Word Identification (CWI) and help in text simplification. Users may copy key sentences in order to use them in summaries or as citations, and accordingly, the frequency of copying full sentences by web users could be used as an indicator in text summarization. Ten different potential uses of copy operations data are described and discussed in this paper. these proposed uses and applications span over a wide range of areas, including web analytics, web personalization. adaptive websites, text simplification, text summarization, detection of plagiarism, and search engine optimization.
In this paper, we present LSHDB, the first parallel and distributed engine for record linkage and similarity search. LSHDB materializes an abstraction layer to hide the mechanics of the Locality-Sensitive Hashing (a p...
详细信息
ISBN:
(纸本)9781509059102
In this paper, we present LSHDB, the first parallel and distributed engine for record linkage and similarity search. LSHDB materializes an abstraction layer to hide the mechanics of the Locality-Sensitive Hashing (a popular method for detecting similar items in high dimensions) which is used as the underlying similarity search engine. LSHDB creates the appropriate data structures from the input data and persists these structures on disk using a noSQL engine. It inherently supports the parallel processing of distributed queries, is highly extensible, and is easy to use. We will demonstrate LSHDB both as the underlying system for detecting similar records in the context of Record Linkage (and of Privacy-Preserving Record Linkage) tasks, as well as a search engine for identifying string values that are similar to submitted queries.
A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gath...
详细信息
ISBN:
(纸本)9781450394079
A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. these labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly-effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. this tutorial, intended for both researchers and practitioners, reviews early work in preference-based evaluation and covers recent developments in detail.
In the last years, governmental bodies have been futilely trying to fight against dark web marketplaces. Shortly after the closing of "the Silk Road" by the FBI and Europol in 2013, new successors have been ...
详细信息
ISBN:
(纸本)9781509059102
In the last years, governmental bodies have been futilely trying to fight against dark web marketplaces. Shortly after the closing of "the Silk Road" by the FBI and Europol in 2013, new successors have been established. through the combination of cryptocurrencies and nonstandard communication protocols and tools, agents can anonymously trade in a marketplace for illegal items without leaving any record. this paper presents a research carried out to gain insights on the products and services sold within one of the larger marketplaces for drugs, fake ids and weapons on the Internet, Agora. Our work sheds a light on the nature of the market;there is a clear preponderance of drugs, which accounts for nearly 80% of the total items on sale. the ready availability of counterfeit documents, while they make up for a much smaller percentage of the market, raises worries. Finally, the role of organized crime within Agora is discussed and presented.
For users working on a complex search task, it is common to address different goals at various stages of the task through query iterations. While addressing these goals, users go through different task states as well....
详细信息
ISBN:
(纸本)9781450394079
For users working on a complex search task, it is common to address different goals at various stages of the task through query iterations. While addressing these goals, users go through different task states as well. Understanding these task states latent under users' interactions is crucial in identifying users' changing intents and search behaviors to simulate and achieve real-time adaptive search recommendations and retrievals. However, the availability of sizeable real-world websearch logs is scarce due to various ethical and privacy concerns, thus often challenging to develop generalizable task-aware computation models. Furthermore, session logs with task state labels are rarer. For many researchers who lack the resources to directly and at scale collect data from users and conduct a time-consuming data annotation process, this becomes a considerable bottleneck to furthering their research. Synthetic search sessions have the potential to address this gap. this paper shares a parsimonious model to simulate synthetic websearch sessions with task state information, which interactive information retrieval (IIR) and search personalization studies could utilize to develop and evaluate task-based search and retrieval systems.
We present Travel Bird, a novel personalized destination recommendation and exploration interface which allows its users to find their next tourist destination by describing their specific preferences in a narrative f...
详细信息
ISBN:
(纸本)9781450394079
We present Travel Bird, a novel personalized destination recommendation and exploration interface which allows its users to find their next tourist destination by describing their specific preferences in a narrative form. Unlike other solutions, Travel Bird is based on TourBERT, a novel NLP model we developed, specifically tailored to the tourism domain. Travel Bird creates a two-dimensional personalized destination exploration space from TourBERT embeddings of social media content and the users' textual description of the experience they are looking for. In this demo, we will showcase several use cases for Travel Bird, which are beneficial for consumers and destination management organizations.
暂无评论