Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources. Code developers often explore various execution parameters, such as hardware configurations, syste...
详细信息
Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources. Code developers often explore various execution parameters, such as hardware configurations, system software choices, and application parameters, and are interested in detecting and understanding bottlenecks in different executions. They often collect hierarchical performance profiles represented as call graphs, which combine performance metrics with their execution contexts. The crucial task of exploring multiple call graphs together is tedious and challenging because of the many structural differences in the execution contexts and significant variability in the collected performance metrics (e.g., execution runtime). In this paper, we present Ensemble CallFlow to support the exploration of ensembles of call graphs using new types of visualizations, analysis, graph operations, and features. We introduce ensemble-Sankey, a new visual design that combines the strengths of resource-flow (Sankey) and box-plot visualization techniques. Whereas the resource-flow visualization can easily and intuitively describe the graphical nature of the call graph, the box plots overlaid on the nodes of Sankey convey the performance variability within the ensemble. Our interactive visual interface provides linked views to help explore ensembles of call graphs, e.g., by facilitating the analysis of structural differences, and identifying similar or distinct call graphs. We demonstrate the effectiveness and usefulness of our design through case studies on large-scale parallel codes.
Room-scale immersive data visualisations provide viewers a wide-scale overview of a largedataset, but to interact precisely with individual data points they typically have to navigate to change their point of view. I...
详细信息
Room-scale immersive data visualisations provide viewers a wide-scale overview of a largedataset, but to interact precisely with individual data points they typically have to navigate to change their point of view. In traditional screen-based visualisations, focus-and-context techniques allow visualisation users to keep a full dataset in view while making detailed selections. Such techniques have been studied extensively on desktop to allow precise selection within largedata sets, but they have not been explored in immersive 3D modalities. In this paper we develop a novel immersive focus-and-context technique based on a "magic portal" metaphor adapted specifically for data visualisation scenarios. An extendable-hand interaction technique is used to place a portal close to the region of interest. The other end of the portal then opens comfortably within the user's physical reach such that they can reach through to precisely select individual data points. Through a controlled study with 12 participants, we find strong evidence that portals reduce overshoots in selection and overall hand trajectory length, reducing arm and shoulder fatigue compared to ranged interaction without the portal. The portals also enable us to use a robot arm to provide haptic feedback for data within the limited volume of the portal region. In a second study with another 12 participants we found that haptics provided a positive experience (qualitative feedback) but did not significantly reduce fatigue. We demonstrate applications for portal-based selection through two use-case scenarios.
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-...
详细信息
The semantic similarity between documents of a text corpus can be visualized using map-like metaphors based on two-dimensional scatterplot layouts. These layouts result from a dimensionality reduction on the document-term matrix or a representation within a latent embedding, including topic models. Thereby, the resulting layout depends on the input data and hyperparameters of the dimensionality reduction and is therefore affected by changes in them. Furthermore, the resulting layout is affected by changes in the input data and hyperparameters of the dimensionality reduction. However, such changes to the layout require additional cognitive efforts from the user. In this work, we present a sensitivity study that analyzes the stability of these layouts concerning (1) changes in the text corpora, (2) changes in the hyperparameter, and (3) randomness in the initialization. Our approach has two stages: data measurement and dataanalysis. First, we derived layouts for the combination of three text corpora and six text embeddings and a grid-search-inspired hyperparameter selection of the dimensionality reductions. Afterward, we quantified the similarity of the layouts through ten metrics, concerning local and global structures and class separation. Second, we analyzed the resulting 42 817 tabular data points in a descriptive statistical analysis. From this, we derived guidelines for informed decisions on the layout algorithm and highlight specific hyperparameter settings. We provide our implementation as a Git repository at (sic) hpicgs/Topic-Models-and-Dimensionality-Reduction-Sensitivity-Study and results as Zenodo archive at DOI:10.5281/zenodo.12772898.
Generative models have received a lot of attention in many areas of academia and the industry. Their capabilities span many areas, from the invention of images given a prompt to the generation of concrete code to solv...
详细信息
ISBN:
(纸本)9798350393811;9798350393804
Generative models have received a lot of attention in many areas of academia and the industry. Their capabilities span many areas, from the invention of images given a prompt to the generation of concrete code to solve a certain programming issue. These two paradigmatic cases fall within two distinct categories of requirements, ranging from "creativity" to "precision", as characterized by Bing Chat, which employs ChatGPT-4 as its backbone. visualization practitioners and researchers have wondered to what end one of such systems could accomplish our work in a more efficient way. Several works in the literature have utilized them for the creation of visualizations. And some tools such as Lida, incorporate them as part of their pipeline. Nevertheless, to the authors' knowledge, no systematic approach for testing their capabilities has been published, which includes both extensive and in-depth evaluation. Our goal is to fill that gap with a systematic approach that analyzes three elements: whether large Language Models are capable of correctly generating a large variety of charts, what libraries they can deal with effectively, and how far we can go to configure individual charts. To achieve this objective, we initially selected a diverse set of charts, which are commonly utilized in datavisualization. We then developed a set of generic prompts that could be used to generate them, and analyzed the performance of different LLMs and libraries. The results include both the set of prompts and the data sources, as well as an analysis of the performance with different configurations.
Choropleth maps are widely used geovisualizations due to their simplicity, especially for applications involving political, climate, and other geospatial data for contiguous regions. There is a need for automated data...
详细信息
Choropleth maps are widely used geovisualizations due to their simplicity, especially for applications involving political, climate, and other geospatial data for contiguous regions. There is a need for automated data extraction from such maps to aid the human-in-the-loop in handling cognitive overload from large-scalevisualization generation and visual impairments. There are gaps in generalizing such a system for choropleth maps with different types of color legends. We propose the choropleth map analytics (CMA) system to address these gaps using a six-step workflow involving deep learning (DL) architectures and tools. We propose a novel method for color-to-data mapping for different color legend types. We finally demonstrate the usability of CMA for a set of choropleth images in climate research for a text summarization application. Our work is a step toward reverse engineering choropleth visualizations. Our code and curated datasets are at: https://***/GVCL/Choropleth-CMA
With the continuous increase in the computational power and resources of modern high-performance computing (HPC) systems, large-scale ensemble simulations have become widely used in various fields of science and engin...
详细信息
ISBN:
(数字)9781665491563
ISBN:
(纸本)9781665491563
With the continuous increase in the computational power and resources of modern high-performance computing (HPC) systems, large-scale ensemble simulations have become widely used in various fields of science and engineering, and especially in meteorological and climate science. It is widely known that the simulation outputs are large time-varying, multivariate, and multivalued datasets which pose a particular challenge to the visualization and analysis tasks. In this work, we focused on the widely used Parallel Coordinates Plot (PCP) to analyze the interrelations between different parameters, such as variables, among the members. However, PCP may suffer from visual cluttering and drawing performance with the increase on the data size to be analyzed, that is, the number of polylines. To overcome this problem, we present an extension to the PCP by adding B ' ezier curves connecting the angular distribution plots representing the mean and variance of the inclination of the line segments between parallel axes. The proposed Angular-based Parallel Coordinates Plot (APCP) is capable of presenting a simplified overview of the entire ensemble data set while maintaining the correlation information between the adjacent variables. To verify its effectiveness, we developed a visual analytics prototype system and evaluated by using a meteorological ensemble simulation output from the supercomputer Fugaku.
In parallel ray tracing, techniques fall into one of two camps: imageparallel techniques aim at increasing frame rate by replicating scene data across nodes and splitting the rendering work across different ranks, and...
详细信息
ISBN:
(数字)9781665491563
ISBN:
(纸本)9781665491563
In parallel ray tracing, techniques fall into one of two camps: imageparallel techniques aim at increasing frame rate by replicating scene data across nodes and splitting the rendering work across different ranks, and data-parallel techniques aim at increasing the size of the model that can be rendered by splitting the model across multiple ranks, but typically cannot scale much in frame rate. We propose and evaluate a hybrid approach that combines the advantages of both by splitting a set of N x M ranks into M islands of N ranks each and using data-parallel rendering within each island and image parallelism across islands. We discuss the integration of this concept into four wildly different parallel renderers and evaluate the efficacy of this approach based on multiple different data sets.
Scientific simulations executed on supercomputers produce massive amounts of data. Visualizing this data is essential to discovery and dissemination, but methods for transforming and displaying such largedata visuali...
详细信息
ISBN:
(数字)9781665491563
ISBN:
(纸本)9781665491563
Scientific simulations executed on supercomputers produce massive amounts of data. Visualizing this data is essential to discovery and dissemination, but methods for transforming and displaying such largedatavisualizations for use in Extended Reality (XR) devices are not commonly supported. We investigated the viability of existing XR applications (i.e., ParaView VR, SummitVR, and Omniverse XR) to display largedatavisualizations. Our investigations led us to create a proof-of-concept Virtual Reality (VR) application with Unity using Universal Scene Description (USD) files exported from Houdini to display and interact with large time-varying scientific datavisualizations. We present our investigations as a basis for future work to display and interact with scientific datavisualizations in XR.
During the current data era, dataanalysis across multiple disciplines has become a critical task for researchers to obtain meaningful insights and solve complex problems that are immeasurable using traditional techno...
详细信息
During the current data era, dataanalysis across multiple disciplines has become a critical task for researchers to obtain meaningful insights and solve complex problems that are immeasurable using traditional technologies. Big data has led to the development of state-of-the-art technologies that have revolutionized the process of experimentation. These innovations span from automating the setup of the infrastructure required for dataanalysis to providing user-friendly interfaces that simplify coding and result visualization. However, managing and scaling these resources for large-scaledata processing remains a challenge. In this work, we introduce a novel framework called datalab as a Service which integrates cutting-edge and open-source technologies to offer an online platform designed for both resource providers and researchers. The platform enables users to easily and automatically deploy interactive environments tailored for dataanalysis, thereby streamlining the process of managing computational resources. Through DLaaS, users gain access to cloud-based infrastructures and distributed computing resources, which are essential for performing compute-intensive tasks on massive datasets. The framework ensures scalability, resource management and optimization, and high availability, all within an accessible and user-friendly platform. Furthermore, this paper presents several use cases where researchers have successfully utilized DLaaS resources, demonstrating its practical applications in real-world scenarios.
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is expensive in terms of labor, time, a...
详细信息
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is expensive in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sentence dataset. We leverage such massive unpaired image and caption data upon standard paired data by learning to associate them. To this end, our novel semi-supervised learning method assigns pseudo-labels to unpaired images and captions in an adversarial learning fashion, where the joint distribution of image and caption is learned. This approach shows noticeable performance improvement even in challenging scenarios, including out-of-task data and web-crawled data. We also show that our proposed method is theoretically well-motivated and has a favorable global optimal property. Our extensive and comprehensive empirical results on captioning datasets, followed by a comprehensive analysis of the scarcely-paired COCO dataset, demonstrate the consistent effectiveness of our method compared to competing ones.
暂无评论