Researchers have derived many theoretical models for specifying users' insights as they interact with a visualization system. These representations are essential for understanding the insight discovery process, su...
详细信息
Researchers have derived many theoretical models for specifying users' insights as they interact with a visualization system. These representations are essential for understanding the insight discovery process, such as when inferring user interaction patterns that lead to insight or assessing the rigor of reported insights. However, theoretical models can be difficult to apply to existing tools and user studies, often due to discrepancies in how insight and its constituent parts are defined. This article calls attention to the consistent structures that recur across the visualization literature and describes how they connect multiple theoretical representations of insight. We synthesize a unified formalism for insights using these structures, enabling a wider audience of researchers and developers to adopt the corresponding models. Through a series of theoretical case studies, we use our formalism to compare and contrast existing theories, revealing interesting research challenges in reasoning about a user's domain knowledge and leveraging synergistic approaches in data mining and data management research.
PurposeThis paper aims to curate open research knowledge graph (ORKG) with papers related to ontology learning and define an approach using ORKG as a computer-assisted tool to organize key-insights extracted from rese...
详细信息
PurposeThis paper aims to curate open research knowledge graph (ORKG) with papers related to ontology learning and define an approach using ORKG as a computer-assisted tool to organize key-insights extracted from research ***/methodology/approachAction research was used to explore, test and evaluate the use of the Open Research knowledge Graph as a computer assistant tool for knowledge acquisition from scientific *** extract, structure and describe research contributions, the granularity of information should be decided;to facilitate the comparison of scientific papers, one should design a common template that will be used to describe the state of the art of a ***/valueThis approach is currently used to document "food information engineering," "tabular data to knowledge graph matching" and "question answering" research problems and the "neurosymbolic AI" domain. More than 200 papers are ingested in ORKG. From these papers, more than 800 contributions are documented and these contributions are used to build over 100 comparison tables. At the end of this work, we found that ORKG is a valuable tool that can reduce the working curve of state-of-the-art research.
Despite being commonly used in big-data analytics;the outcome of dimensionality reduction remains a black-box to most of its users. Understanding the quality of a low-dimensional embedding is important as not only it ...
详细信息
Despite being commonly used in big-data analytics;the outcome of dimensionality reduction remains a black-box to most of its users. Understanding the quality of a low-dimensional embedding is important as not only it enables trust in the transformed data, but it can also help to select the most appropriate dimensionality reduction algorithm in a given scenario. As existing research primarily focuses on the visual exploration of embeddings, there is still a need for enhancing interpretability of such algorithms. To bridge this gap, we propose two novel interactive explanation techniques for low-dimensional embeddings obtained from any dimensionality reduction algorithm. The first technique LAPS produces a local approximation of the neighborhood structure to generate interpretable explanations on the preserved locality for a single instance. The second method GAPS explains the retained global structure of a high-dimensional dataset in its embedding, by combining non-redundant local-approximations from a coarse discretization of the projection space. We demonstrate the applicability of the proposed techniques using 16 real-life tabular, text, image, and audio datasets. Our extensive experimental evaluation shows the utility of the proposed techniques in interpreting the quality of low-dimensional embeddings, as well as with selecting the most suitable dimensionality reduction algorithm for any given dataset.
Although popularly used in big-data analytics, dimensionality reduction is a complex, black-box technique whose outcome is difficult to interpret and evaluate. In recent years, a number of quantitative and visual meth...
详细信息
Although popularly used in big-data analytics, dimensionality reduction is a complex, black-box technique whose outcome is difficult to interpret and evaluate. In recent years, a number of quantitative and visual methods have been proposed for analyzing low-dimensional embeddings. On the one hand, quantitative methods associate numeric identifiers to qualitative characteristics of these embeddings;and, on the other hand, visual techniques allow users to interactively explore these embeddings and make decisions. However, in the former case, users do not have control over the analysis, while in the latter case. assessment decisions are entirely dependent on the user's perception and expertise. In order to bridge the gap between the two, in this article, we present VisExPreS, a visual interactive toolkit that enables a user-driven assessment of low-dimensional embeddings. VisExPreS is based on three novel techniques namely PG-LAPS, PG-GAPS, and RepSubset, that generate interpretable explanations of the preserved local and global structures in embeddings. In the first two techniques, the VisExPreS system proactively guides users during every step of the analysis. We demonstrate the utility of VisExPreS in interpreting, analyzing, and evaluating embeddings from different dimensionality reduction algorithms using multiple case studies and an extensive user study.
Visual data mining with virtual reality spaces is used for the representation of data and symbolic knowledge. High quality structure-preserving and maximally discriminative visual representations can be obtained using...
详细信息
Visual data mining with virtual reality spaces is used for the representation of data and symbolic knowledge. High quality structure-preserving and maximally discriminative visual representations can be obtained using a combination of neural networks (SAMANN and NDA) and rough sets techniques, so that a proper subsequent analysis can be made. The approach is illustrated with two types of data: for gene expression cancer data, an improvement in classification performance with respect to the original spaces was obtained;for geophysical prospecting data for cave detection, a cavity was successfully predicted. Crown (C) 2012 and Elsevier Ltd. All rights reserved.
Time series data are usually collected through instruments equipped with sensors. For multi-sensor time series (MSTS), it is crucial to identify the factors that affect tasks such as classification and regression. To ...
详细信息
Time series data are usually collected through instruments equipped with sensors. For multi-sensor time series (MSTS), it is crucial to identify the factors that affect tasks such as classification and regression. To better understand the mechanisms affecting the task, we extracted the correlation patterns between sensors and represented them as symmetric positive definite matrices. The correlation patterns were then transformed into vector form to serve as input features for regression or classification models, and the models were trained for each sensor. Finally, we leveraged the interpretability of the models to analyze and visualize the correlation patterns at both micro and macro scales. By integrating the explanatory power of the models with correlation patterns, we could interpret the task in terms of time and space, providing valuable insights for exploring the underlying data rules. We evaluated our proposed method using both synthetic and real data, and the simulation results confirmed its effectiveness.
The effective visual exploration of dynamic networks has been one of the toughest challenges and an unsolved problem;however, it is very important to understand network evolution. Although many developments have been ...
详细信息
The effective visual exploration of dynamic networks has been one of the toughest challenges and an unsolved problem;however, it is very important to understand network evolution. Although many developments have been achieved in modeling evolutionary networks, the closely related task of visualizing continues to remain a major concern. Therefore, in this study, quantitative analysis is used to assign node attributes in the network topology, and then, the evolutionary process of networks is analyzed. By fixing the position of nodes, the possibility of a stationary shape of the network is suggested, and a more intuitive and comprehensive explanation of the enumeration of the types for the evolutionary process is provided. Further, a large amount of information is presented in this study in an extremely economical and accessible way by incorporating a circular layout and evolution laws, which offers a new approach for the estimation and evaluation of network evolution. Finally, this three-pronged approach-network analysis, quantitative method, and topological modeling-is expected to provide a revelatory insight into the principle of network evolution.
Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that ...
详细信息
Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that projects high-dimensional data to 1-dimensional space by using a filter function that is subsequently used to reconstruct the data topology relationships. However, domain context information and prior knowledge have not been considered in current TDA modeling frameworks. Here, we report the development and evaluation of a semi-supervised topological analysis (STA) framework that incorporates discrete or continuously labeled data points and selects the most relevant filter functions accordingly. We validate the proposed STA framework with simulation data and then apply it to samples from Genotype-Tissue Expression data and ovarian cancer transcriptome datasets. The graphs generated by STA for these 2 datasets, based on gene expression profiles, are consistent with prior knowledge, thereby supporting the effectiveness of the proposed framework.
data science is a field that has developed to enable efficient integration and analysis of increasingly large data sets in many domains. In particular, big data in genetics, neuroimaging, mobile health, and other subf...
详细信息
data science is a field that has developed to enable efficient integration and analysis of increasingly large data sets in many domains. In particular, big data in genetics, neuroimaging, mobile health, and other subfields of biomedical science, promises new insights, but also poses challenges. To address these challenges, the National Institutes of Health launched the Big data to knowledge (BD2K) initiative, including a Training Coordinating Center (TCC) tasked with developing a resource for personalized data science training for biomedical researchers. The BD2K TCC web portal is powered by ERuDIte, the Educational Resource Discovery Index, which collects training resources for data science, including online courses, videos of tutorials and research talks, textbooks, and other web-based materials. While the availability of so many potential learning resources is exciting, they are highly heterogeneous in quality, difficulty, format, and topic, making the field intimidating to enter and difficult to navigate. Moreover, data science is rapidly evolving, so there is a constant influx of new materials and concepts. We leverage data science techniques to build ERuDIte itself, using data extraction, data integration, machine learning, information retrieval, and natural language processing to automatically collect, integrate, describe, and organize existing online resources for learning data science.
Interactive datavisualization tools for residential energy data are instrumental indicators for analyzing end user behavior. These visualizations can be used as continuous home feedback systems and can be accessed fr...
详细信息
Interactive datavisualization tools for residential energy data are instrumental indicators for analyzing end user behavior. These visualizations can be used as continuous home feedback systems and can be accessed from mobile devices using touch-based applications. visualizations have to be carefully selected in order for them to partake in the behavioral transformation that end users are encouraged to adopt. In this paper, six energy datavisualizations are evaluated in a randomized controlled trial fashion to determine the optimal datavisualization tool. Conventional visualizations, namely bar, line, and stacked area, are compared against enhanced charts, namely spiral, heatmap, and stacked bar, in terms of effectiveness, aesthetic, understandability, and three analysis questions. The study is conducted through a questionnaire in a mobile application. The application, created through React Native, is circulated to participants in multiple countries, collecting 133 responses. From the received responses, conventional plots scored higher understandability (by 22.74%), effectiveness (by 13.44%), and aesthetic (by 10.54%) when compared with the enhanced visualizations. On the flipside, enhanced plots generated higher correct analysis questions' responses by 8% compared to the conventional counterparts. From the 133 collected responses, and after applying the unpaired t-test, conventional energy datavisualization plots are considered superior in terms of understandability, effectiveness, and aesthetic.
暂无评论