Highly automated modern manufacturing processes are yielding large databases with records on hundreds of process variables and product characteristics. This large amount of information calls for new approaches to prod...
详细信息
Highly automated modern manufacturing processes are yielding large databases with records on hundreds of process variables and product characteristics. This large amount of information calls for new approaches to production process analysis. In this paper, we discuss why a data mining framework can be appropriate for this goal, and we propose a visualdata mining strategy to mine large and high-dimensional off-line data sets. The strategy allows users to achieve a deeper process understanding through a set of linked interactive graphical devices, and is illustrated within an industrial process case study. Copyright (C) 2003 John Wiley Sons, Ltd.
The relationship between protein mutations and conformational change can potentially decipher the language relating sequence to structure. Elsewhere, we presented the Protein Mutant Resource (PAM), an online tool that...
详细信息
ISBN:
(纸本)0769520006
The relationship between protein mutations and conformational change can potentially decipher the language relating sequence to structure. Elsewhere, we presented the Protein Mutant Resource (PAM), an online tool that systematically identified related mutants in the Protein dataBank (PDB), inferred mutant Gene Ontology classifications using data-mining, and allowed intuitive exploration of relationships between mutant structures. Here, we perform a comprehensive statistical analysis of PAM mutants. Although the PAM contains spectacular conformational changes, generally there is a counter-intuitive inverse relationship between conformational change and the number of mutations. That is, PDB mutations contrast naturally evolved mutations. We compare the frequencies of mutations in the PMP/PDB datasets against the PAM250 natural mutation frequencies to confirm this. We make available morph movies from PAM structure pairs, allowing visualanalysis of conformational change and the ability to distinguish visually between conformational change due to motions (e.g., ligand binding) and mutations. The PAM is at http://***.
The Internet pervades many aspects of our lives and is becoming indispensable to critical functions in areas such as commerce, government, production and general information dissemination. To maintain the stability an...
详细信息
The Internet pervades many aspects of our lives and is becoming indispensable to critical functions in areas such as commerce, government, production and general information dissemination. To maintain the stability and efficiency of the Internet, every effort must be made to protect it against various forms of attacks, malicious uses, and errors. A key component in the Internet security effort is the routine examination of Internet routing data, which unfortunately can be too large and complicated to browse directly. We have developed an interactive visualization process which proves to be very effective for the analysis of Internet routing data. In this application paper, we show how each step in the visualization process helps direct the analysis and glean insights from the data. These insights include the discovery of patterns, detection of faults and abnormal events, understanding of event correlations, formation of causation hypotheses, and classification of anomalies. We also discuss lessons learned in our visualanalysis study.
Unstructured meshes are often used in simulations and imaging applications. They provide advanced flexibility in modeling abilities but are more difficult to manipulate and analyze than regular data. This work provide...
详细信息
ISBN:
(纸本)0780381203
Unstructured meshes are often used in simulations and imaging applications. They provide advanced flexibility in modeling abilities but are more difficult to manipulate and analyze than regular data. This work provides a novel approach for the analysis of unstructured meshes using feature-space clustering and feature-detection. Analyzing and revealing underlying structures in data involve operators on both spatial and functional domains. Slicing concentrates more on the spatial domain, while iso-surfacing or volume-rendering concentrate more on the functional domain. Nevertheless, many times it is the combination of the two domains which provides real insight on the structure of the data. In this work a combined feature-space is defined on top of unstructured meshes in order to search for structure in the data. A point in feature-space includes the spatial coordinates of the point in the mesh domain and all chosen attributes defined on the mesh. A distance measures between points in feature-space is defined enabling the utilization of clustering using the mean shift procedure (previously used for images) on unstructured meshes. Feature space analysis is shown to be useful for feature-extraction, for dataexploration and partitioning.
Highly automated modern manufacturing processes are yielding large databases with records on hundreds of process variables and product characteristics. This large amount of information calls for new approaches to prod...
详细信息
Highly automated modern manufacturing processes are yielding large databases with records on hundreds of process variables and product characteristics. This large amount of information calls for new approaches to production process analysis. In this paper, we discuss why a data mining framework can be appropriate for this goal, and we propose a visualdata mining strategy to mine large and high-dimensional off-line data sets. The strategy allows users to achieve a deeper process understanding through a set of linked interactive graphical devices, and is illustrated within an industrial process case study. Copyright (C) 2003 John Wiley Sons, Ltd.
Multiparameter imaging techniques provide large numbers of high-dimensional image data in modern biomedical research. Besides algorithms for image registration, normalization and segmentation, new methods for interact...
详细信息
ISBN:
(纸本)0780377893
Multiparameter imaging techniques provide large numbers of high-dimensional image data in modern biomedical research. Besides algorithms for image registration, normalization and segmentation, new methods for interactive dataexploration must be proposed and evaluated. We propose a new approach for auditory data representation, based on sonification. The approach is applied to a multiparameter image data set, generated with immunofluorescence techniques and compared to a conventional visualization approach and to a combination of both. For comparison, a psychophysical experiment was conducted, in which one standard evaluation procedure is modeled. Our results show, that all three approaches lead to comparable evaluation accuracies for all subjects. We conclude, that both, acoustical and visual approaches can be combined to display data sets of large dimensionality.
We introduce a probabilistic model that generalizes classical linear discriminant analysis and gives an interpretation for the components as informative or relevant components of data. The components maximize the pred...
详细信息
ISBN:
(纸本)1577351894
We introduce a probabilistic model that generalizes classical linear discriminant analysis and gives an interpretation for the components as informative or relevant components of data. The components maximize the predictability of class distribution which is asymptotically equivalent to (i) maximizing mutual information with the classes, and (ii) finding principal components in the so-called learning or Fisher metrics. The Fisher metric measures only distances that are relevant to the classes, that is, distances that cause changes in the class distribution. The components have applications in dataexploration, visualization, and dimensionality reduction. In empirical experiments the method outperformed a Renyi entropy-based alternative and linear discriminant analysis.
In this paper, we will show how any speaker recognition system can be adapted to provide its results according to the bayesian approach for evidence analysis and forensic reporting. This approach, firmly established i...
详细信息
ISBN:
(纸本)0780376633
In this paper, we will show how any speaker recognition system can be adapted to provide its results according to the bayesian approach for evidence analysis and forensic reporting. This approach, firmly established in other forensic areas as fingerprint. DNA or fiber analysis, suits the needs of both the court and the forensic scientist. We will show the inadequacy of the classical approach to forensic reporting because of the use of thresholds and the suppression of the prior probabilities related to the case. We will also show how to assess the performance of those forensic systems through Tippet plots. Finally, an example is shown using NIST-Ahumada eval'2001 data, where the speaker recognition abilities of our system are assessed through DET plots. using then these raw scores as evidences into the forensic system, where relative to populations we will obtain the corresponding likelihood ratios values, which are assessed through Tippet plots.
The multidimensional data representation has been one of the greatest challenges of data Mining. The visual resource is very useful in the knowledge discovery process once it enables an easier and faster understanding...
详细信息
ISBN:
(纸本)1853128066
The multidimensional data representation has been one of the greatest challenges of data Mining. The visual resource is very useful in the knowledge discovery process once it enables an easier and faster understanding of the data distribution. This work deals with the development of a tool that supports dataexploration through visualization, clustering and classification methods. The tool is called Starcluster and it was implemented using Microsoft Excel and the visual Basic programming language. Starcluster allows users to visualize and to manipulate multidimensional data using the new technique Star Coordinates. Star Coordinates is a coordinate transformation that enables the plotting of multidimensional data in 2D space. Each variable is represented by an axis and a point represents each multidimensional data element. By changing the size and angle of the axes, it is possible to integrate and separate dimensions, analyze correlations of multiple dimensions, view clusters, trends and outliers in the distribution of data. Starcluster also provides a k-means clustering method and principal components analysis to complement the data understanding task. This paper presents the main features of Starcluster and an application example.
data cube is the core operator in data warehousing and OLAP. Its efficient computation, maintenance, and utilization for query answering and advanced analysis have been the subjects of numerous studies. However, for m...
详细信息
ISBN:
(纸本)0127224424
data cube is the core operator in data warehousing and OLAP. Its efficient computation, maintenance, and utilization for query answering and advanced analysis have been the subjects of numerous studies. However, for many applications, the huge size of the data cube limits its applicability as a means for semantic exploration by the user. Recently, we have developed a systematic approach to achieve efficacious data cube construction and exploration by semantic summarization and compression. Our approach is pivoted on a notion of quotient cube that groups together structurally related data cube cells with common (aggregate) measure values into equivalence classes. The equivalence relation used to partition the cube lattice preserves the rollup/drill-down semantics of the data cube, in that the same kind of explorations can be conducted in the quotient cube as in the original cube, between classes instead of between cells. We have also developed compact data structures for representing a quotient cube and efficient algorithms for answering queries using a quotient cube for its incremental maintenance against updates. We have implemented SOCQET, a prototype data warehousing system making use of our results on quotient cube. In this demo, we will demonstrate (1) the critical techniques of building a quotient cube;(2) use of a quotient cube to answer various queries and to support advanced OLAP;(3) an empirical study on the effectiveness and efficiency of quotient cube-based data warehouses and OLAP;(4) a user interface for visual and interactive OLAP;and (5) SOC-QET, a research prototype data warehousing system integrating all the techniques. The demo reflects our latest research results and may stimulate some interesting future studies.
暂无评论