Analysing the evolution of a site and the behaviour of its users has become a crucial work in order to let the webmasters and the owners of web sites to enhance thei site in terms of structure and contents. data minin...
详细信息
ISBN:
(纸本)9780769529004
Analysing the evolution of a site and the behaviour of its users has become a crucial work in order to let the webmasters and the owners of web sites to enhance thei site in terms of structure and contents. datamining techniques provide many metrics and statistics useful to understand the structure of a website, and the use its users make of it, but they are still not easy to interpret and to understand Most of these metrics can also be combined, in order to discover new trends and patterns. We present a preliminary prototype of an exploratory search system for webmining which main goal is to provide a set of tools and visual representations that allow the user to explore and to decide how to represent the available data. this approach defines a system that will help us to evaluate the usability of the implemented interactions and visual metaphors.
User expectations of websearch are changing. they are expecting search engines to answer questions, to be more conversational, and to offer means to complete tasks on their behalf. At the same time, to increase the b...
详细信息
ISBN:
(纸本)9781450355810
User expectations of websearch are changing. they are expecting search engines to answer questions, to be more conversational, and to offer means to complete tasks on their behalf. At the same time, to increase the breadth of tasks that personal digital assistants (PDAs), such as Microsoft's Cortana or Amazon's Alexa, are capable of, PDAs need to better utilize information about the world, a significant amount of which is available in the knowledge bases and answers built for search engines. It thus seems likely that the underlying systems that power websearch and PDAs will converge. this demonstration presents a system that merges elements of traditional multi-turn dialog systems withweb based question answering. this demo focuses on the automatic composition of semantic functional units, Botlets, to generate responses to user's natural language (NL) queries. We show that such a system can be trained to combine information from search engine answers with PDA tasks to enable new user experiences.
In this paper, we propose a Markov CLustering (MCL) based text mining approach for namesake disambiguation on the web. the novelty of the proposed technique lies in modeling the collection of webpages using a weighted...
详细信息
ISBN:
(纸本)9780769548807;9781467360579
In this paper, we propose a Markov CLustering (MCL) based text mining approach for namesake disambiguation on the web. the novelty of the proposed technique lies in modeling the collection of webpages using a weighted graph structure and applying MCL to crystalize it into different clusters, each one containing the webpages related to a particular namesake individual. the proposed method focuses on three broad and realistic aspects to cluster webpages retrieved through search engines - content overlapping, structure overlapping, and local context overlapping. the efficacy of the proposed method is demonstrated through experimental evaluations on standard datasets.
web archiving is the process of gathering data from the web, storing it and ensuring the data is preserved in an archive for future explorations. Despite the increasing number of web archives, the absence of meaningfu...
详细信息
ISBN:
(纸本)9781450382977
web archiving is the process of gathering data from the web, storing it and ensuring the data is preserved in an archive for future explorations. Despite the increasing number of web archives, the absence of meaningful exploration methods remains a major hurdle in the way of turning them into a useful information source. Withthe creation of profiles describing metadata information about the archived documents it is possible to offer a more exploitable environment that goes beyond the simple keyword-based search. By exploring the expressive power of SPARQL language and providing a user friendly web-based search interface, users can run sophisticated queries and search for documents that meet their information needs.
Multilingual topic models are a fairly novel group of unsupervised, language-independent and generative machine learning models. this tutorial covers all key aspects of their probabilistic framework and demonstrates h...
详细信息
ISBN:
(纸本)9781450323512
Multilingual topic models are a fairly novel group of unsupervised, language-independent and generative machine learning models. this tutorial covers all key aspects of their probabilistic framework and demonstrates how to easily integrate these models into frameworks for cross-lingual and multilingual webmining and search.
the proceedings contain 6 papers. the topics discussed include: institutional repositories as a data trust infrastructure;understanding demographic bias and representation in social media healthdata;on bias in social...
ISBN:
(纸本)9781450361743
the proceedings contain 6 papers. the topics discussed include: institutional repositories as a data trust infrastructure;understanding demographic bias and representation in social media healthdata;on bias in social reviews of university courses;LILE2019: 8thinternational Workshop on Learning and Education withwebdata;search and justification behavior during multimedia websearch for procedural knowledge;and web research ethics: confidentiality, consent, data integrity & more.
We present HSDM, a full-day workshop on Healthsearch and datamining co-located with WSDM 2020's Health Day. this event builds on recent biomedical workshops in the NLP and ML communities but puts a clear emphasi...
详细信息
ISBN:
(纸本)9781450368223
We present HSDM, a full-day workshop on Healthsearch and datamining co-located with WSDM 2020's Health Day. this event builds on recent biomedical workshops in the NLP and ML communities but puts a clear emphasis on search and datamining (and their intersection) that is lacking in other venues. the program will include two keynote addresses by key opinion leaders in the clinical, search, and datamining domains. the technical program consists of 6 original research presentations. Finally, we will close with a panel discussion with keynote speakers, PC members, and the audience. this workshop aims to help consolidate the growing interest in biomedical applications of data-driven methods that becomes apparent all over the search and datamining spectrum, in WSDM's spirit of collaboration between industry and academia.
Frequent Episode mining is a well-studied problem in the area of temporal datamining. there are many methods for mining serial, parallel and general partial order episodes. However, many of these existing methods are...
详细信息
ISBN:
(纸本)9798400716348
Frequent Episode mining is a well-studied problem in the area of temporal datamining. there are many methods for mining serial, parallel and general partial order episodes. However, many of these existing methods are not very effective in capturing patterns where some events are constrained to occur simultaneously. there are a few methods for discovering such serial episodes;these methods use Depth-First search based approaches and are not very efficient. In this paper, we propose a novel efficient algorithm for mining frequent serial episodes with simultaneous events. Our algorithm follows the Breadth-First search approach, and, for this, we present a novel candidate generation method and formally prove its correctness. We also propose a small but significant modification to the traditional Finite State Automata based frequency counting which results in considerable speed-up of the frequency counting step. through several simulation experiments involving both synthetic and real data, we demonstrate the efficiency of the proposed algorithm.
this paper provides an overview of the workshop web-Scale Classification: web Classification in the Big data Era which was held in New York City, on February 28th as a workshop of the seventhinternationalconference ...
详细信息
ISBN:
(纸本)9781450323512
this paper provides an overview of the workshop web-Scale Classification: web Classification in the Big data Era which was held in New York City, on February 28th as a workshop of the seventhinternationalconference on websearch and datamining. the goal of the workshop was to discuss and assess recent research focusing on classification and mining in web-scale category systems. the workshop brought together members of several communities such webmining, machine learning, text classification and social media mining.
暂无评论