Visual analytics of multidimensionaldata suffer from the curse of dimensionality, i.e., that even large numbers of data points will be scattered in a high-dimensional space. The curse of dimensionality prohibits the ...
详细信息
ISBN:
(纸本)9789897583063
Visual analytics of multidimensionaldata suffer from the curse of dimensionality, i.e., that even large numbers of data points will be scattered in a high-dimensional space. The curse of dimensionality prohibits the proper use of clustering algorithms in the high-dimensional space. Projecting the space before clustering imposes a loss of information and possible mixing of separated clusters. We present an approach where we overcome the curse of dimensionality for a particular type of multidimensionaldata, namely for attribute spaces of multivariate volume data. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physical space). We apply this idea to a histogram-based clustering algorithm. We create a uniform partition of the attribute space in multidimensional bins and compute a histogram indicating the number of data samples belonging to each bin. Only non-empty bins are stored for efficiency. Without interpolation, the analysis is highly sensitive to the cell sizes yielding inaccurate clustering for improper choices: Large cells result in no cluster separation, while clusters fall apart for small cells. Using tri-linear interpolation in physical space, we can refine the data by generating additional samples. The refinement scheme can adapt to the data point distribution in attribute space and the histogram's bin size. As a consequence, we can generate a density computation, where clusters stay connected even when using very small cell sizes. We exploit this result to create a robust hierarchical cluster tree. It can be visually explored using coordinated views to physical space visualizations and to parallel coordinates plots. We apply our technique to several datasets and compare the results against results without interpolation.
Today there are abounding collected data in cases of various diseases in medical sciences. Physicians can access new findings about diseases and procedures in dealing with them by probing these data. Clinical data is ...
详细信息
ISBN:
(纸本)9781728128504
Today there are abounding collected data in cases of various diseases in medical sciences. Physicians can access new findings about diseases and procedures in dealing with them by probing these data. Clinical data is a collection of large and complex datasets that commonly appear in multidimensionaldata formats. It has been recognized as a big challenge in modern data analysis tasks. Therefore, there is an urgent need to find new and effective techniques to deal with such huge datasets. This paper presents an application of a new visual data mining platform for visual analysis of the stroke data for predicting the levels of risk to those people who have the similar characteristics of the stroke patients. The visualization platform uses a hierarchical clustering algorithm to aggregate the data and map coherent groups of data-points to the same visual elements - curved 'super-polylines' that significantly reduces the visual complexity of the visualization. On the other hand, to enable users to interactively manipulate data items (super-polylines) in the parallel coordinates geometry through the mouse rollover and clicking, we created many 'virtual nodes' along the multi-axis of the visualization based on the hierarchical structure of the value range of selected data attributes. The experimental result shows that we can easily verily research hypothesis and reach to the conclusion of research questions through human-data & human-algorithm interactions by using this visual platform with a fully transparency manner of data processing.
This paper proposed a Space-Optimized Scatter Plot Matrix that used for the presentation of multi-dimensionaldataset. This technique achieves the display space utilization in a 2D geometrical space. Our strategy is t...
详细信息
ISBN:
(纸本)9783319467719;9783319467702
This paper proposed a Space-Optimized Scatter Plot Matrix that used for the presentation of multi-dimensionaldataset. This technique achieves the display space utilization in a 2D geometrical space. Our strategy is to maximize the utilization of computer space by optimizing the distribution of the plots in a geometrical plane of a display screen;We also apply interact mechanism, user query and visual cues, to support users' communication with variables and the discovery of deeper contents.
The industrial revolution has elevated science and engineering to foster the development of Image Processing and Artificial Intelligence (AI) and put the visualization of information on an even higher pedestal. Yet, t...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
The industrial revolution has elevated science and engineering to foster the development of Image Processing and Artificial Intelligence (AI) and put the visualization of information on an even higher pedestal. Yet, the demands of the industrial age have contributed to an ever-growing wildfire of climate change, sparking a revolution in energy efficiency research. With the aim to advance energy efficiency research from an AI standpoint, a novel transformation of raw-formatted data repositories, known as data lakes, into multi-dimensionalvisualizations data coupled with computationally lightweight, edge-based AI implementations are proposed as means to understand the energy consumption patterns in buildings. As a novel method of understanding energy data visually, current results comprise a multi-dimensional Gramian Angular Field (GAF) representation of energy data as both 2D and 3D interactive forms. Moreover, a case study on deep learning classification employed on ODROID-XU4 yields similar to 90% accuracy and a classification rate of 17.5 msec/image.
The study of aerosol composition for air quality research involves the analysis of high-dimensional single particle mass spectrometry data. We describe, apply, and evaluate a novel interactive visual framework for dim...
详细信息
The study of aerosol composition for air quality research involves the analysis of high-dimensional single particle mass spectrometry data. We describe, apply, and evaluate a novel interactive visual framework for dimensionality reduction of such data. Our framework is based on non-negative matrix factorization with specifically defined regularization terms that aid in resolving mass spectrum ambiguity. Thereby, visualization assumes a key role in providing insight into and allowing to actively control a heretofore elusive data processing step, and thus enabling rapid analysis meaningful to domain scientists. In extending existing black box schemes, we explore design choices for visualizing, interacting with, and steering the factorization process to produce physically meaningful results. A domain-expert evaluation of our system performed by the air quality research experts involved in this effort has shown that our method and prototype admits the finding of unambiguous and physically correct lower-dimensional basis transformations of mass spectrometry data at significantly increased speed and a higher degree of ease.
Clustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attri...
详细信息
Clustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physical space). Thus, sufficiently high number of data points can be generated, overcoming the curse of dimensionality for this particular type of multidimensionaldata. We applies this idea to a histogram-based clustering algorithm. We created a uniform partition of the attribute space in multidimensional bins and computed a histogram indicating the number of data samples belonging to each bin. Without interpolation, the analysis was highly sensitive to the histogram cell sizes, yielding inaccurate clustering for improper choices: Large histogram cells result in no cluster separation, while clusters fall apart for small cells. Using an interpolation in physical space, we could refine the data by generating additional samples. The depth of the refinement scheme was chosen according to the local data point distribution in attribute space and the histogram's bin size. In the case of field discontinuities representing sharp material boundaries in the volume data, the interpolation can be adapted to locally make use of a nearest-neighbor interpolation scheme that avoids averaging values across the sharp boundary. Consequently, we could generate a density computation, where clusters stay connected even when using very small bin sizes. We exploited this result to create a robust hierarchical cluster tree, apply our technique to several datasets, and compare the cluster trees before and after interpolation.
暂无评论