Multilingual topic models are a fairly novel group of unsupervised, language-independent and generative machine learning models. this tutorial covers all key aspects of their probabilistic framework and demonstrates h...
详细信息
ISBN:
(纸本)9781450323512
Multilingual topic models are a fairly novel group of unsupervised, language-independent and generative machine learning models. this tutorial covers all key aspects of their probabilistic framework and demonstrates how to easily integrate these models into frameworks for cross-lingual and multilingual webmining and search.
Dynamic Miss-Counting algorithms are proposed, which find all implication and similarity rules with confidence pruning but without support pruning. To handle data sets with a large number of columns, we propose dynami...
详细信息
Dynamic Miss-Counting algorithms are proposed, which find all implication and similarity rules with confidence pruning but without support pruning. To handle data sets with a large number of columns, we propose dynamic pruning techniques that can be applied during data scanning. DMC counts the numbers of rows in which each pair of columns disagree instead of counting the number of hits. DMC deletes a candidate as soon as the number of misses exceeds the maximum number of misses allowed for that pair. We also propose several optimization techniques that reduce the required memory size significantly. We evaluated our algorithms by using 4 data sets, i.e., web access logs, web page-link graph, News documents, and a Dictionary. these data sets have between 74,000 and 700,000 items. Experiments show that DMC can find high-confidence rules for such a large data sets efficiently.
Conversational agents, or commonly known as dialogue systems, have gained escalating popularity in recent years. their widespread applications support conversational interactions with users and accomplishing various t...
详细信息
ISBN:
(纸本)9781450394079
Conversational agents, or commonly known as dialogue systems, have gained escalating popularity in recent years. their widespread applications support conversational interactions with users and accomplishing various tasks as personal assistants. However, one key weakness in existing conversational agents is that they only learn to passively answer user queries via training on pre-collected and manually-labeled data. Such passiveness makes the interaction modeling and system-building process relatively easier, but it largely hinders the possibility of being human-like hence lowering the user engagement level. In this tutorial, we introduce and discuss methods to equip conversational agents withthe ability to interact with end users in a more proactive way. this three-hour tutorial is divided into three parts and includes two interactive exercises. It reviews and presents recent advancements on the topic, focusing on automatically expanding ontology space, actively driving conversation by asking questions or strategically shifting topics, and retrospectively conducting response quality control.
the 1st international Workshop on Two-sided Marketplace Optimization: search, Pricing, Matching & Growth (TSMO) will be held in Los Angeles, California, USA on February 9th, 2018, co-located withthe 11th ACM Inte...
详细信息
ISBN:
(纸本)9781450355810
the 1st international Workshop on Two-sided Marketplace Optimization: search, Pricing, Matching & Growth (TSMO) will be held in Los Angeles, California, USA on February 9th, 2018, co-located withthe 11th ACM internationalconference on websearch and datamining (WSDM). the main objective of the workshop is to address the challenges of two-sided marketplace optimization in web-scale settings. the workshop brings together interdisciplinary researchers in information retrieval, recommender systems, personalization, and related areas, to share, exchange, learn, and develop preliminary results, new concepts, ideas, principles, and methodologies on applying datamining technologies to marketplace optimization. We have constructed an exciting program papers and invited talks that will help us better understand the future of two-sided marketplaces
Speech AI Technologies are largely trained on publicly available datasets or by the massive web-crawling of speech. In both cases, data acquisition focuses on minimizing collection effort, without necessarily taking t...
详细信息
ISBN:
(纸本)9781450394079
Speech AI Technologies are largely trained on publicly available datasets or by the massive web-crawling of speech. In both cases, data acquisition focuses on minimizing collection effort, without necessarily taking the data subjects' protection or user needs into consideration. this results to models that are not robust when used on users who deviate from the dominant demographics in the training set, discriminating individuals having different dialects, accents, speaking styles, and disfluencies. In this talk, we use automatic speech recognition as a case study and examine the properties that ethical speech datasets should possess towards responsible AI applications. We showcase diversity issues, inclusion practices, and necessary considerations that can improve trained models, while facilitating model explainability and protecting users and data subjects. We argue for the legal & privacy protection of data subjects, targeted data sampling corresponding to user demographics & needs, appropriate meta datathat ensure explainability & accountability in cases of model failure, and the sociotechnical & situated model design. We hope this talk can inspire researchers & practitioners to design and use more human-centric datasets in speech technologies and other domains, in ways that empower and respect users, while improving machine learning models' robustness and utility.
In social voting web sites, how do the user actions - up-votes, down-votes and comments - evolve over time? Are there relationships between votes and comments? What is normal and what is suspicious? these are the ques...
详细信息
ISBN:
(纸本)9781509054732
In social voting web sites, how do the user actions - up-votes, down-votes and comments - evolve over time? Are there relationships between votes and comments? What is normal and what is suspicious? these are the questions we focus on. We analyzed over 20,000 submissions corresponding to more than 100 million user interactions from three social voting web sites: Reddit, Imgur and Digg. Our first contribution is two discoveries: (i) the number of comments grows as a power-law on the number of votes and (ii) the time between a submission creation and a user's reaction obeys a log-logistic distribution. Based on these patterns, we propose VNC (VOTE-AND-COMMENT), a parsimonious but accurate and scalable model that models the coevolution of user activities. In our experiments on real data, VNC outperformed state-of-the-art baselines on accuracy. Additionally, we illustrate VNC usefulness for forecasting and outlier detection.
the 10th ACM international Workshop on web Information and data Management (WIDM 2008), which was held in Napa Valley, California, the US, in conjunction withthe 17thinternationalconference on Information and Knowl...
详细信息
the 10th ACM international Workshop on web Information and data Management (WIDM 2008), which was held in Napa Valley, California, the US, in conjunction withthe 17thinternationalconference on Information and Knowledge Management (CIKM), on October 30, 2008, focused on how web information can be extracted, stored, analyzed, and processed to provide useful knowledge to the end users for various advanced database applications. the papers presented at the workshop were grouped in the following subject areas, namely, datamining and clustering, systems issues, web 2.0 and social networks, and ranking and similarity search. One paper entitled Event Detection with Common User Interests focused on the problem of identifying events that can be detected through the publication of online documents and the search queries posed over said documents. Nereau: Query Expansion Using Social Bookmark presented a new approach to enhance query expansion with personalization by exploiting tag information from social bookmarking services.
When factorizing binary matrices, we often have to make a choice between using expensive combinatorial methods that retain the discrete nature of the data and using continuous methods that can be more efficient but de...
详细信息
ISBN:
(纸本)9781509054732
When factorizing binary matrices, we often have to make a choice between using expensive combinatorial methods that retain the discrete nature of the data and using continuous methods that can be more efficient but destroy the discrete structure. Alternatively, we can first compute a continuous factorization and subsequently apply a rounding procedure to obtain a discrete representation. But what will we gain by rounding? Will this yield lower reconstruction errors? Is it easy to find a low-rank matrix that rounds to a given binary matrix? Does it matter which threshold we use for rounding? Does it matter if we allow for only non-negative factorizations? In this paper, we approach these and further questions by presenting and studying the concept of rounding rank. We show that rounding rank is related to linear classification, dimensionality reduction, and nested matrices. We also report on an extensive experimental study that compares different algorithms for finding good factorizations under the rounding rank model.
A web-based business always wants to have the ability to track users' browsing behavior history. this ability can be achieved by using web log mining technologies. In this paper, we introduce a Self-Organizing Map...
详细信息
ISBN:
(纸本)9781604231847
A web-based business always wants to have the ability to track users' browsing behavior history. this ability can be achieved by using web log mining technologies. In this paper, we introduce a Self-Organizing Map (SOM) based approach to miningweb log data. the SOM network maps the web pages into a two-dimensional map based on the users' browsing history. web pages withthe similar browsing patterns are clustered together. Together with associate rules, the cluster generated by the SOM network has significant meaning to web browsing behavior. the experimental results demonstrate the feasibility and the effectiveness of this approach.
this paper presents a web application for Association Rules mining (ARM). It utilizes Apriori that is the most widely used algorithm for this type of datamining tasks. the web application is called webApriori and off...
详细信息
ISBN:
(纸本)9783030496630;9783030496623
this paper presents a web application for Association Rules mining (ARM). It utilizes Apriori that is the most widely used algorithm for this type of datamining tasks. the web application is called webApriori and offers a modern responsive web interface and a web service to scientific communities working in the field of ARM. It is also appropriate for educational purposes. webApriori implements an Apriori engine that can efficiently discover the hidden associations in data and it is capable to process different types of datasets. Part of the process involves the removal of redundant associations rules. the asynchronous communication between the front-end, back-end, web service and Apriori engine layers efficiently handles multiple concurrent user requests.
暂无评论