Genomics is at the core of precision medicine, and there are high expectations on genomics-enabled improvement of patient outcomes in the years to come. Around the world, initiatives to increase the use of DNA sequenc...
详细信息
Genomics is at the core of precision medicine, and there are high expectations on genomics-enabled improvement of patient outcomes in the years to come. Around the world, initiatives to increase the use of DNA sequencing in clinical routine are being deployed, such as the use of broad panels in the standard care for oncology patients. Such a development comes at the cost of increased demands on throughput in genomic dataanalysis. In this paper, we use the task of copy number variant (CNV) analysis as a context for exploring visualization concepts for clinical genomics. CNV calls are generated algorithmically, but time-consuming manual intervention is needed to separate relevant findings from irrelevant ones in the resulting large call candidate lists. We present a visualization environment, named Copycat, to support this review task in a clinical scenario. Key components are a scatter-glyph plot replacing the traditional list visualization, and a glyph representation designed for at-a-glance relevance assessments. Moreover, we present results from a formative evaluation of the prototype by domain specialists, from which we elicit insights to guide both prototype improvements and visualization for clinical genomics in general.
With the advancement of satellite communication technology, the maritime Internet of Things (IoT) has made significant progress. As a result, vast amounts of Automatic Identification System (AIS) data from global vess...
详细信息
With the advancement of satellite communication technology, the maritime Internet of Things (IoT) has made significant progress. As a result, vast amounts of Automatic Identification System (AIS) data from global vessels are transmitted to various maritime stakeholders through Maritime IoT systems. AIS data contains a large amount of dynamic and static information that requires effective and intuitive visualization for comprehensive analysis. However, two major deficiencies challenge current visualization models: a lack of consideration for interactions between distant pixels and low efficiency. To address these issues, we developed a large-scale vessel trajectories visualization algorithm, called the Non-local Kernel Density Estimation (NLKDE) algorithm, which incorporates a non-local convolution process. It accurately calculates the density distribution of vessel trajectories by considering correlations between distant pixels. Additionally, we implemented the NLKDE algorithm under a Graphics Processing Unit (GPU) framework to enable parallel computing and improve operational efficiency. Comprehensive experiments using multiple vessel trajectory datasets show that the NLKDE algorithm excels in vessel trajectory density visualization tasks, and the GPU-accelerated framework significantly shortens the execution time to achieve real-time results. From both theoretical and practical perspectives, GPU-accelerated NLKDE provides technical support for real-time monitoring of vessel dynamics in complex water areas and contributes to constructing maritime intelligent transportation systems. The code for this paper can be accessed at: https://***/maohliang/GPU-NLKDE.
Translating natural language to visualization (NL2VIS) has shown great promise for visual dataanalysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language pro...
详细信息
Translating natural language to visualization (NL2VIS) has shown great promise for visual dataanalysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.
The use of natural language interfaces (NLIs) to create charts is becoming increasingly popular due to the intuitiveness of natural language interactions. One key challenge in this approach is to accurately capture us...
详细信息
The use of natural language interfaces (NLIs) to create charts is becoming increasingly popular due to the intuitiveness of natural language interactions. One key challenge in this approach is to accurately capture user intents and transform them to proper chart specifications. This obstructs the wide use of NLI in chart generation, as users' natural language inputs are generally abstract (i.e., ambiguous or under-specified), without a clear specification of visual encodings. Recently, pre-trained large language models (LLMs) have exhibited superior performance in understanding and generating natural language, demonstrating great potential for downstream tasks. Inspired by this major trend, we propose ChartGPT, generating charts from abstract natural language inputs. However, LLMs are struggling to address complex logic problems. To enable the model to accurately specify the complex parameters and perform operations in chart generation, we decompose the generation process into a step-by-step reasoning pipeline, so that the model only needs to reason a single and specific sub-task during each run. Moreover, LLMs are pre-trained on general datasets, which might be biased for the task of chart generation. To provide adequate visualization knowledge, we create a dataset consisting of abstract utterances and charts and improve model performance through fine-tuning. We further design an interactive interface for ChartGPT that allows users to check and modify the intermediate outputs of each step. The effectiveness of the proposed system is evaluated through quantitative evaluations and a user study.
datavisualization in the form of charts plays a pivotal role in dataanalysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements w...
详细信息
datavisualization in the form of charts plays a pivotal role in dataanalysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper aims to provide valuable insights and directions for future research in chart understanding leveraging large foundation models.
Scalar field comparison is a fundamental task in scientific visualization. In topological dataanalysis, we compare topological descriptors of scalar fields-such as persistence diagrams and merge trees-because they pr...
详细信息
Scalar field comparison is a fundamental task in scientific visualization. In topological dataanalysis, we compare topological descriptors of scalar fields-such as persistence diagrams and merge trees-because they provide succinct and robust abstract representations. Several similarity measures for topological descriptors seem to be both asymptotically and practically efficient with polynomial time algorithms, but they do not scale well when handling large-scale, time-varying scientific data and ensembles. In this paper, we propose a new framework to facilitate the comparative analysis of merge trees, inspired by tools from locality sensitive hashing (LSH). LSH hashes similar objects into the same hash buckets with high probability. We propose two new similarity measures for merge trees that can be computed via LSH, using new extensions to Recursive MinHash and subpath signature, respectively. Our similarity measures are extremely efficient to compute and closely resemble the results of existing measures such as merge tree edit distance or geometric interleaving distance. Our experiments demonstrate the utility of our LSH framework in applications such as shape matching, clustering, key event detection, and ensemble summarization.
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-...
详细信息
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-term matrix or a representation within a latent embedding, including topic models. Thereby, the resulting layout depends on the input data and hyperparameters of the dimensionality reduction and is therefore affected by changes in them. Furthermore, the resulting layout is affected by changes in the input data and hyperparameters of the dimensionality reduction. However, such changes to the layout require additional cognitive efforts from the user. In this work, we present a sensitivity study that analyzes the stability of these layouts concerning (1) changes in the text corpora, (2) changes in the hyperparameter, and (3) randomness in the initialization. Our approach has two stages: data measurement and dataanalysis. First, we derived layouts for the combination of three text corpora and six text embeddings and a grid-search-inspired hyperparameter selection of the dimensionality reductions. Afterward, we quantified the similarity of the layouts through ten metrics, concerning local and global structures and class separation. Second, we analyzed the resulting 42 817 tabular data points in a descriptive statistical analysis. From this, we derived guidelines for informed decisions on the layout algorithm and highlight specific hyperparameter settings. We provide our implementation as a Git repository at (sic) hpicgs/Topic-Models-and-Dimensionality-Reduction-Sensitivity-Study and results as Zenodo archive at DOI:10.5281/zenodo.12772898.
Differential privacy ensures the security of individual privacy but poses challenges to data exploration processes because the limited privacy budget incapacitates the flexibility of exploration and the noisy feedback...
详细信息
Differential privacy ensures the security of individual privacy but poses challenges to data exploration processes because the limited privacy budget incapacitates the flexibility of exploration and the noisy feedback of data requests leads to confusing uncertainty. In this study, we take the lead in describing corresponding exploration scenarios, including underlying requirements and available exploration strategies. To facilitate practical applications, we propose a visual analysis approach to the formulation of exploration strategies. Our approach applies a reinforcement learning model to provide diverse suggestions for exploration strategies according to the exploration intent of users. A novel visual design for representing uncertainty in correlation patterns is integrated into our prototype system to support the proposed approach. Finally, we implemented a user study and two case studies. The results of these studies verified that our approach can help develop strategies that satisfy the exploration intent of users.
In this study, we address the growing issue of misleading charts, a prevalent problem that undermines the integrity of information dissemination. Misleading charts can distort the viewer's perception of data, lead...
详细信息
In this study, we address the growing issue of misleading charts, a prevalent problem that undermines the integrity of information dissemination. Misleading charts can distort the viewer's perception of data, leading to misinterpretations and decisions based on false information. The development of effective automatic detection methods for misleading charts is an urgent field of research. The recent advancement of multimodal large Language Models (LLMs) has introduced a promising direction for addressing this challenge. We explored the capabilities of these models in analyzing complex charts and assessing the impact of different prompting strategies on the models' analyses. We utilized a dataset of misleading charts collected from the internet by prior research and crafted nine distinct prompts, ranging from simple to complex, to test the ability of four different multimodal LLMs in detecting over 21 different chart issues. Through three experiments-from initial exploration to detailed analysis-we progressively gained insights into how to effectively prompt LLMs to identify misleading charts and developed strategies to address the scalability challenges encountered as we expanded our detection range from the initial five issues to 21 issues in the final experiment. Our findings reveal that multimodal LLMs possess a strong capability for chart comprehension and critical thinking in data interpretation. There is significant potential in employing multimodal LLMs to counter misleading information by supporting critical thinking and enhancing visualization literacy. This study demonstrates the applicability of LLMs in addressing the pressing concern of misleading charts.
Concerns related to the veracity and originality of the content on social networks are at an ongoing rise. Considerable work has been done on information spreading, and tools have been built, while approaches with pro...
详细信息
ISBN:
(纸本)9783031785375;9783031785382
Concerns related to the veracity and originality of the content on social networks are at an ongoing rise. Considerable work has been done on information spreading, and tools have been built, while approaches with provenance-based analysis are rare. We are of the opinion that provenance-based analysis and visualization tools can make (mis-)information spreading analysis more efficient. Thus, we study provenance, and present a provenance pipeline for data analytics, where users are able to interact with multiple network analysis modules through a graphical user interface, and describe a proof-of-concept system. Although provenance visualization can suffice in capturing all the necessary metadata, integration with other network visualization modules suited to the same data enhanced our results analysis and conclusions. Having designed distinct provenance models, we captured and analysed lineage of information on community dynamics. We tested our proposed prototype with a real-world dataset comprising of more than 10 million filtered tweets, focused on COVID-19 vaccinations, and conducted an analysis on community dynamics with network science metrics and NLP.
暂无评论