We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality...
详细信息
We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactivedata exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms.
Scientific data visualization requires a variety of mathematical techniques to transform multivariate data sets into simple graphical objects, or glyphs, that provide scientists and engineers with a clearer understand...
详细信息
Scientific data visualization requires a variety of mathematical techniques to transform multivariate data sets into simple graphical objects, or glyphs, that provide scientists and engineers with a clearer understanding of the underlying system behaviour. The spherical self-organizing feature map (SOFM) described in this paper exploits an unsupervised clustering algorithm to map randomly organized N-dimensional data into a lower three-dimensional (3D) space for visual pattern analysis. Each node on the spherical lattice corresponds to a cluster of input vectors that lie in close spatial proximity within the original feature space, and neighbouring nodes on the lattice represent cluster centres with a high degree of vector similarity. Simple metrics are used to extract associations between the cluster units and the input vectors assigned to them. These are then graphically displayed on the spherical SOFM as either surface elevations or colourized facets. The resulting colourized graphical objects are displayed and manipulated within 3D immersive virtual reality (IVR) environments for interactive data analysis. The ability of the proposed algorithm to transform arbitrarily arranged numeric strings into unique, reproducible shapes is illustrated using chaotic data generated by the Lozi, Henon, Rossler, and Lorenz attractor functions under varying initial conditions. Implementation of the basic data visualization technique is further demonstrated using the more common Wisconsin breast cancer data and multispectral satellite data. (C) 2003 Elsevier Ltd. All rights reserved.
Objective: Wide-scale adoption of electronic medical records (EMRs) has created an unprecedented opportunity for the implementation of Rapid Learning Systems (RLSs) that leverage primary clinical data for real-time de...
详细信息
Objective: Wide-scale adoption of electronic medical records (EMRs) has created an unprecedented opportunity for the implementation of Rapid Learning Systems (RLSs) that leverage primary clinical data for real-time decision support. In cancer, where large variations among patient features leave gaps in traditional forms of medical evidence, the potential impact of a RLS is particularly promising. We developed the Melanoma Rapid Learning Utility (MRLU), a component of the RLS, providing an analytical engine and user interface that enables physicians to gain clinical insights by rapidly identifying and analyzing cohorts of patients similar to their own. Materials and methods: A new approach for clinical decision support in Melanoma was developed and implemented, in which patient-centered cohorts are generated from practice-based evidence and used to power on-the-fly stratified survival analyses. A database to underlie the system was generated from clinical, pharmaceutical, and molecular data from 237 patients with metastatic melanoma from two academic medical centers. The system was assessed in two ways: (1) ability to rediscover known knowledge and (2) potential clinical utility and usability through a user study of 13 practicing oncologists. Results: The MRLU enables physician-driven cohort selection and stratified survival analysis. The system successfully identified several known clinical trends in melanoma, including frequency of BRAF mutations, survival rate of patients with BRAF mutant tumors in response to BRAF inhibitor therapy, and sex-based trends in prevalence and survival. Surveyed physician users expressed great interest in using such on-the-fly evidence systems in practice (mean response from relevant survey questions 4.54/5.0), and generally found the MRLU in particular to be both useful (mean score 4.2/5.0) and useable (4.42/5.0). Discussion: The MRLU is an RLS analytical engine and user interface for Melanoma treatment planning that presents design p
The unsupervised estimation problem has been conveniently formulated in terms of a mixture density. It has been shown that a criterion naturally arises whose maximum defines the Bayes minimum risk solution. This crite...
详细信息
The unsupervised estimation problem has been conveniently formulated in terms of a mixture density. It has been shown that a criterion naturally arises whose maximum defines the Bayes minimum risk solution. This criterion is the expected value of the natural log of the mixture density. By making the assumptions that the component densities in the mixture are truncated Gaussian, the criterion has a greatly simplified form. This criterion can be used to resolve mixtures when the number of classes as well as the class covariances are unknown. In this paper a technique is presented where an assumed test covariance is supplied by an experimenter who uses a test function as a "portable magnifying glass" to examine data. Because the experimenter supplies the covariance and thus the test function, the technique is especially suited for interactive data analysis.
This paper proposes using picture processing techniques in the automatic analysis of bathythermograph records. It is shown that by treating and displaying this type of data as a digital picture a much more extensive a...
详细信息
This paper proposes using picture processing techniques in the automatic analysis of bathythermograph records. It is shown that by treating and displaying this type of data as a digital picture a much more extensive and useful analysis is possible than with conventional techniques. A description is given of results obtained with an interactive program developed to implement such processing. This program, amongst other things, allows the operator to suppress noise, sharpen edges, enhance various structures and detect events of particular interest.
Visualization can help a lot to understand the huge amounts of data created in computer simulations and benchmark experiments. In Eugster et al. (Technical report 30, Institut fur Statistik, Ludwig-Maximilians-Univers...
详细信息
Visualization can help a lot to understand the huge amounts of data created in computer simulations and benchmark experiments. In Eugster et al. (Technical report 30, Institut fur Statistik, Ludwig-Maximilians-Universitat Munchen, Germany 2008) we presented a comprehensive toolbox for exploration and inference on benchmark data, including the bench plot. This plot visualizes the behavior of the algorithms on the individual drawn learning and test samples according to a specific performance measure. In this paper we show that an interactive version of the bench plot can help to uncover details and relations unseen with the static version.
We describe concepts and software for the auditing of data analyses. Auditing begins with the record of a dataanalysis session. The record tells what statements were executed and what objects were accessed or changed...
详细信息
We describe concepts and software for the auditing of data analyses. Auditing begins with the record of a dataanalysis session. The record tells what statements were executed and what objects were accessed or changed, and can be processed to recreate chosen statements in the analysis for purposes of verification. It can also be the starting point for asking a variety of questions about the analysis, through an interactive, exploratory interface, as a data analyst’s assistant.
An important competence of human data analysts is to interpret and explain the meaning of the results of dataanalysis to end-users. However, existing automatic solutions for intelligent dataanalysis provide limited ...
详细信息
ISBN:
(纸本)9783642247996
An important competence of human data analysts is to interpret and explain the meaning of the results of dataanalysis to end-users. However, existing automatic solutions for intelligent dataanalysis provide limited help to interpret and communicate information to non-expert users. In this paper we present a general approach to generating explanatory descriptions about the meaning of quantitative sensor data. We propose a type of web application: a virtual newspaper with automatically generated news stories that describe the meaning of sensor data. This solution integrates a variety of techniques from intelligent dataanalysis into a web-based multimedia presentation system. We validated our approach in a real world problem and demonstrate its generality using data sets from several domains. Our experience shows that this solution can facilitate the use of sensor data by general users and, therefore, can increase the utility of sensor network infrastructures.
Mulreg is an interactive computing environment for data exploration, regression modeling, and the visualization and use of regression results. It is designed to allow both statisticians and nonstatisticians easy acces...
详细信息
The distribution of multiclass discrete data in geographic space is a research hotspot in the field of geographic related visualization. As a basic visual presentation of such data, the advantages of dot maps are perc...
详细信息
ISBN:
(纸本)9781509022397
The distribution of multiclass discrete data in geographic space is a research hotspot in the field of geographic related visualization. As a basic visual presentation of such data, the advantages of dot maps are perceptual intuition and abundant details, but there is also the problem of poor readability due to the points overlap. The approach of density estimation by resolution is proposed in this paper to optimize dot maps, and to flexibly adjust sampling parameters of the current resolution, so as to show the details to the maximum extent and maintain the relative density characteristics of various types of property. In order to compensate the missing discrete features caused by sampling, a series of interactive tools are used to effectively improve the accuracy of visual analysis and assist the overall visual representation. Finally, the effectiveness of this approach is proved through case analysis and user research.
暂无评论