In the age of network sciences and machine learning, efficient algorithms are now in higher demand more than ever before. Big data fundamentally challenges the classical notion of efficient algorithms: Algorithms that...
详细信息
ISBN:
(纸本)9781450355810
In the age of network sciences and machine learning, efficient algorithms are now in higher demand more than ever before. Big data fundamentally challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today's problems. It is not just desirable, but essential, that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation. Using several basic tasks in network analysis, machine learning, and optimization as examples - in this talk - I will highlight a family of fundamental algorithmic techniques for designing provably-good scalable algorithms.
In past years several works have noted that Twitter data are essential in diverse fields and may have a lot of applications. Nevertheless, the API proposed by Twitter sternly restricts access to public data generated ...
详细信息
ISBN:
(纸本)9781450382977
In past years several works have noted that Twitter data are essential in diverse fields and may have a lot of applications. Nevertheless, the API proposed by Twitter sternly restricts access to public data generated by users. these restrictions have the consequences of greatly slowing down the contributions of researchers and of limiting their scope. In this paper we introduce TwiScraper, a collaborative project to enhance Twitter data collection by scraping methods. We present a module allowing user-centered data collection: Twi-FFN.
Optimization-related techniques are playing ever increasing role for web intelligence. Yet optimization may mean different things to different people. In this paper, starting with a general discussion on the many face...
详细信息
ISBN:
(纸本)9780769548807
Optimization-related techniques are playing ever increasing role for web intelligence. Yet optimization may mean different things to different people. In this paper, starting with a general discussion on the many facets of optimization in information technology ( IT), we focus our examination on the recent development in database keyword search, from an optimization-oriented perspective. We first present a review and analysis on related literature, pointing out the need for incorporating contextual information in database keyword queries to achieve better performance. We then present our recently proposed approach, where keyword search is extended to simple English queries to better capture user intension. We conclude this paper by providing a discussion on the importance of database keyword search optimization for datamining and web intelligence.
Withthe proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for ***, ***, etc., efficient ways of search...
详细信息
ISBN:
(纸本)9781450323512
Withthe proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for ***, ***, etc., efficient ways of searching, exploring and/or mining such webdata are of increasing importance. there are two key challenges facing these tasks: how to properly understand web interfaces, and how to bypass the interface restrictions. In this tutorial, we start with a general overview of websearch and datamining, including various exciting applications enabled by the effective search, exploration, and mining of web repositories. then, we focus on the fundamental developments in the field, including web interface understanding, crawling, sampling, and data analytics over web repositories with various types of interfaces. We also discuss the potential changes required for query processing, datamining and machine learning algorithms to be applied to webdata. Our goal is two-fold: one is to promote the awareness of existing webdatasearch/exploration/mining techniques among all web researchers who are interested in leveraging webdata, and the other is to encourage researchers, especially those who have not previously worked in websearch and mining before, to initiate their own research in these exciting areas.
this work presents a playground platform to demonstrate and interactively explore a suite of methods for utilizing user review texts to generate book recommendations. the focus is on search-based settings where the us...
详细信息
ISBN:
(纸本)9798400703713
this work presents a playground platform to demonstrate and interactively explore a suite of methods for utilizing user review texts to generate book recommendations. the focus is on search-based settings where the user provides situative context by focusing on a genre, a given item, her full user profile, or a newly formulated query. the platform allows exploration over two large datasets with various methods for creating concise user profiles.
We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern...
详细信息
ISBN:
(纸本)9781450355810
We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords;(2) integrate incremental and parallel pattern mining;and (3) support analytical queries such as trend analysis. the core idea of Percolator is to dynamically decide and verify a small fraction of patterns and their instances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a) the feasibility of incremental pattern mining by walking through each component of Percolator, b) the efficiency and scalability of Percolator over the sheer size of real-world dynamic graphs, and c) how the user-friendly GUI of Percolator interacts with users to support keyword-based queries that detect, browse and inspect trending patterns. We demonstrate how our system effectively supports event and trend analysis in social media streams and research publication, respectively.
Multidimensional data appear frequently in many web-related applications, e.g., product ratings, the bag-of-words representation of web pages, etc. Principal Component Analysis (PCA) has been widely used for discoveri...
详细信息
ISBN:
(纸本)9781450355810
Multidimensional data appear frequently in many web-related applications, e.g., product ratings, the bag-of-words representation of web pages, etc. Principal Component Analysis (PCA) has been widely used for discovering patterns in relationships among entities in multidimensional data. However, existing algorithms for PCA have limited scalability since they explicitly materialize intermediate data, whose size rapidly grows as the dimension increases. To avoid scalability issues, we propose sSketch, a scalable sketching technique for PCA that employs several optimization ideas, such as mean propagation, efficient sparse matrix operations, and effective job consolidation to minimize intermediate data. Using sSketch, we also provide two other scalable methods for deriving singular value and 2-norm of reconstruction error, both of which are used for data analysis purpose. We provide our implementation on popular Spark framework for distributed platform. We compare our method against state-of-the-art library functions available for distributed settings, namely MLlib-PCA and Mahout-PCA with real big datasets. Our experiments show that our method outperforms both of them by a wide margin. To encourage reproducibility, the source code of sSketch is made publicly available at https://***/dataminingResearch/sSketch.
there is a diverse variety of demographic datathat can be analyzed with modern methods of datamining to achieve better results. On the one hand, the main chosen task is to compare different methods for the next even...
详细信息
ISBN:
(纸本)9781450382977
there is a diverse variety of demographic datathat can be analyzed with modern methods of datamining to achieve better results. On the one hand, the main chosen task is to compare different methods for the next event prediction and gender prediction, on the other hand, we pay special attention to interpretable patterns describing demographic behavior in the studied problems. there were considered interpretable methods as decision trees and their ensembles and semi- or non-interpretable methods, such as the SVM method with different customized kernels tailored for demographers' needs and neural networks, respectively. the best accuracy results were obtained with two-channel Convolutional Neural Networks.
By offering courses and resources, learning platforms on the web have been attracting lots of participants, and the interactions withthese systems have generated a vast amount of learning-related data. their collecti...
详细信息
ISBN:
(纸本)9781450382977
By offering courses and resources, learning platforms on the web have been attracting lots of participants, and the interactions withthese systems have generated a vast amount of learning-related data. their collection, processing and analysis have promoted a significant growth of learning analytics and have opened up new opportunities for supporting and assessing educational experiences. To provide all the stakeholders involved in the educational process with a timely guidance, being able to understand student's behavior and enable models which provide data-driven decisions pertaining to the learning domain is a primary property of online platforms, aiming at maximizing learning outcomes. In this workshop, we focus on collecting new contributions in this emerging area and on providing a common ground for researchers and practitioners (web site: https://***/l2d-wsdm2021/).
暂无评论