检索结果-内蒙古大学图书馆

Integrating GPU-Accelerated for Fast large-Scale Vessel Trajectories visualization in Maritime IoT Systems

ieee TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2025年第3期26卷 4048-4065页

作者： Liang, Maohan Liu, Kezhong Gao, Ruobin Li, Yan Natl Univ Singapore Dept Civil & Environm Engn Singapore 119077 Singapore Wuhan Univ Technol Sch Nav Wuhan 430062 Peoples R China Northwestern Polytech Univ Sch Marine Sci & Technol Xian 710072 Peoples R China Wuhan Univ State Key Lab Informat Engn Surveying Mapping & Re Wuhan 430072 Peoples R China

With the advancement of satellite communication technology, the maritime Internet of Things (IoT) has made significant progress. As a result, vast amounts of Automatic Identification System (AIS) data from global vessels are transmitted to various maritime stakeholders through Maritime IoT systems. AIS data contains a large amount of dynamic and static information that requires effective and intuitive visualization for comprehensive analysis. However, two major deficiencies challenge current visualization models: a lack of consideration for interactions between distant pixels and low efficiency. To address these issues, we developed a large-scale vessel trajectories visualization algorithm, called the Non-local Kernel Density Estimation (NLKDE) algorithm, which incorporates a non-local convolution process. It accurately calculates the density distribution of vessel trajectories by considering correlations between distant pixels. Additionally, we implemented the NLKDE algorithm under a graphics Processing Unit (GPU) framework to enable parallel computing and improve operational efficiency. Comprehensive experiments using multiple vessel trajectory datasets show that the NLKDE algorithm excels in vessel trajectory density visualization tasks, and the GPU-accelerated framework significantly shortens the execution time to achieve real-time results. From both theoretical and practical perspectives, GPU-accelerated NLKDE provides technical support for real-time monitoring of vessel dynamics in complex water areas and contributes to constructing maritime intelligent transportation systems. The code for this paper can be accessed at: https://***/maohliang/GPU-NLKDE.

关键词： Trajectory data visualization graphics processing units Artificial intelligence Convolution Real-time systems Heuristic algorithms Feature extraction data transfer data mining Trajectory visualization non-local convolution graphics processing unit parallel computing maritime Internet of Things

来源：评论

学校读者我要写书评

暂无评论

Adaptively Placed Multi-Grid Scene Representation Networks for large-Scale data visualization

引用

ieee TRANSACTIONS ON visualization AND COMPUTER graphics 2024年第1期30卷 965-974页

作者： Wurster, Skylar W. Xiong, Tianyu Shen, Han-Wei Guo, Hanqi Peterka, Tom Ohio State Univ Columbus OH 43210 USA

Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored.

关键词： Training Adaptation models data models Solid modeling Computer architecture Rendering (computer graphics) Encoding Scene representation network deep learning scientific visualization volume rendering

来源：评论

学校读者我要写书评

暂无评论

Spatio-Temporal Visual Analysis of Turbulent Superstructures in Unsteady Flow

引用

ieee TRANSACTIONS ON visualization AND COMPUTER graphics 2024年第7期30卷 3346-3358页

作者： Ghaffari, Behdad Gatti, Davide Westermann, Rudiger Tech Univ Munich D-80333 Munich Germany Karlsruhe Inst Technol D-76131 Karlsruhe Germany

The large-scale motions in 3D turbulent channel flows, known as Turbulent Superstructures (TSS), play an essential role in the dynamics of small-scale structures within the turbulent boundary layer. However, as of today, there is no common agreement on the spatial and temporal relationships between these multiscale structures. We propose a novel space-time visualization technique for analyzing the temporal evolution of these multiscale structures in their spatial context and, thus, to further shed light on the conceptually different explanations of their dynamics. Since the temporal dynamics of TSS are believed to influence the structures in the turbulent boundary layer, we propose a combination of a 2D space-time velocity plot with an orthogonal 2D plot of projected 3D flow structures, which can interactively span the time and the space axis. Besides flow structures indicating the fluid motion, we propose showing the variations in derived fields as an additional source of explanation. The relationships between the structures in different spatial and temporal scales can be more effectively resolved by using various filtering operations and image registration algorithms. To reduce the information loss due to the non-injective nature of projection, spatial information is encoded into transparency or color. Since the proposed visualization is heavily demanding computational resources and memory bandwidth to stream unsteady flow fields and instantly compute derived 3D flow structures, the implementation exploits data compression, parallel computation capabilities, and high memory bandwidth on recent GPUs via the CUDA compute library.

关键词： Three-dimensional displays visualization graphics processing units Periodic structures data visualization Feature extraction Bandwidth Flow visualization large-scale data techniques animation and motion-related techniques

来源：评论

学校读者我要写书评

暂无评论

Visual Diagnostics of parallel Performance in Training large-Scale DNN Models

引用

ieee TRANSACTIONS ON visualization AND COMPUTER graphics 2024年第7期30卷 3915-3929页

作者： Wei, Yating Wang, Zhiyong Wang, Zhongwei Dai, Yong Ou, Gongchang Gao, Han Yang, Haitao Wang, Yue Cao, Caleb Chen Weng, Luoxuan Lu, Jiaying Zhu, Rongchen Chen, Wei Zhejiang Univ State Key Lab CAD & CG Hangzhou 310058 Zhejiang Peoples R China Huawei Technol Co Ltd Distributed Data Lab Shenzhen 518129 Peoples R China

Diagnosing the cluster-based performance of large-scale deep neural network (DNN) models during training is essential for improving training efficiency and reducing resource consumption. However, it remains challenging due to the incomprehensibility of the parallelization strategy and the sheer volume of complex data generated in the training processes. Prior works visually analyze performance profiles and timeline traces to identify anomalies from the perspective of individual devices in the cluster, which is not amenable for studying the root cause of anomalies. In this article, we present a visual analytics approach that empowers analysts to visually explore the parallel training process of a DNN model and interactively diagnose the root cause of a performance issue. A set of design requirements is gathered through discussions with domain experts. We propose an enhanced execution flow of model operators for illustrating parallelization strategies within the computational graph layout. We design and implement an enhanced Marey's graph representation, which introduces the concept of time-span and a banded visual metaphor to convey training dynamics and help experts identify inefficient training processes. We also propose a visual aggregation technique to improve visualization efficiency. We evaluate our approach using case studies, a user study and expert interviews on two large-scale models run in a cluster, namely, the PanGu-alpha 13B model (40 layers), and the Resnet model (50 layers).

关键词： Training data visualization Computational modeling Solid modeling parallel processing Performance evaluation data models Deep neural network model training parallel performance visual analysis

来源：评论

学校读者我要写书评

暂无评论

Fast Compressed Segmentation Volumes for Scientific visualization

引用

ieee TRANSACTIONS ON visualization AND COMPUTER graphics 2024年第1期30卷 12-22页

作者： Piochowiak, Max Dachsbacher, Carsten Karlsruhe Inst Technol Karlsruhe Germany

Voxel-based segmentation volumes often store a large number of labels and voxels, and the resulting amount of data can make storage, transfer, and interactive visualization difficult. We present a lossless compression technique which addresses these challenges. It processes individual small bricks of a segmentation volume and compactly encodes the labelled regions and their boundaries by an iterative refinement scheme. The result for each brick is a list of labels, and a sequence of operations to reconstruct the brick which is further compressed using rANS-entropy coding. As the relative frequencies of operations are very similar across bricks, the entropy coding can use global frequency tables for an entire data set which enables efficient and effective parallel (de)compression. Our technique achieves high throughput (up to gigabytes per second both for compression and decompression) and strong compression ratios of about 1% to 3% of the original data set size while being applicable to GPU-based rendering. We evaluate our method for various data sets from different fields and demonstrate GPU-based volume visualization with on-the-fly decompression, level-of-detail rendering (with optional on-demand streaming of detail coefficients to the GPU), and a caching strategy for decompressed bricks for further performance improvement.

关键词： Segmentation volumes lossless compression volume rendering

来源：评论

学校读者我要写书评

暂无评论

Distributed Augmentation, Hypersweeps, and Branch Decomposition of Contour Trees for Scientific Exploration

引用

ieee TRANSACTIONS ON visualization AND COMPUTER graphics 2025年第1期31卷 152-162页

作者： Li, Mingzhe Carr, Hamish Rubel, Oliver Wang, Bei Weber, Gunther H. Univ Utah Salt Lake City UT 84112 USA Univ Leeds Leeds England Lawrence Berkeley Natl Lab Berkeley CA USA

Contour trees describe the topology of level sets in scalar fields and are widely used in topological data analysis and visualization. A main challenge of utilizing contour trees for large-scale scientific data is their computation at scale using high-performance computing. To address this challenge, recent work has introduced distributed hierarchical contour trees for distributed computation and storage of contour trees. However, effective use of these distributed structures in analysis and visualization requires subsequent computation of geometric properties and branch decomposition to support contour extraction and exploration. In this work, we introduce distributed algorithms for augmentation, hypersweeps, and branch decomposition that enable parallel computation of geometric properties, and support the use of distributed contour trees as query structures for scientific exploration. We evaluate the parallel performance of these algorithms and apply them to identify and extract important contours for scientific visualization.

关键词： branch decomposition Contour trees parallel algorithms computational topology computational topology topological data analysis topological data analysis computational topology topological data analysis

来源：评论

学校读者我要写书评

暂无评论

Standardized data-parallel Rendering Using ANARI 14

Standardized Data-Parallel Rendering Using ANARI

引用

14th symposium on large data Analysis and visualization

作者： Wald, Ingo Zellmann, Stefan Amstutz, Jefferson Wu, Qi Griffin, Kevin Jaros, Milan Wesner, Stefan NVIDIA Santa Clara CA 95051 USA Univ Cologne Cologne Germany Univ Calif Davis Davis CA USA VSB Tech Univ Ostrava IT4Innovat Ostrava Czech Republic

ISBN: (纸本)9798331516932;9798331516925

We propose and discuss a paradigm that allows for expressing data-parallel rendering with the classically non-parallel ANARI API. We propose this as a new standard for data-parallel rendering, describe two different implementations of this paradigm, and use multiple sample integrations into existing applications to show how easy it is to adopt, and what can be gained from doing so.

关键词： anari distributed rendering ray tracing sci-vis

来源：评论

学校读者我要写书评

暂无评论

Traveler: Navigating Task parallel Traces for Performance Analysis

引用

ieee TRANSACTIONS ON visualization AND COMPUTER graphics 2023年第1期29卷 788-797页

作者： Sakin, Sayef Azad Bigelow, Alex Tohid, R. Scully-Allison, Connor Scheidegger, Carlos Brandt, Steven R. Taylor, Christopher Huck, Kevin A. Kaiser, Hartmut Isaacs, Katherine E. Univ Arizona Tucson AZ 85721 USA Louisiana State Univ Baton Rouge LA 70803 USA RStudio Boston MA USA Tact Comp Labs Muenster TX USA Univ Utah Salt Lake City UT 84112 USA

Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large savings in terms of computational resource use. To aid performance analysis, developers may collect an execution trace-a chronological log of program activity during execution. As traces represent the full history, developers can discover a wide array of possibly previously unknown performance issues, making them an important artifact for exploratory performance analysis. However, interactive trace visualization is difficult due to issues of data size and complexity of meaning. Traces represent nanosecond-level events across many parallel processes, meaning the collected data is often large and difficult to explore. The rise of asynchronous task parallel programming paradigms complicates the relation between events and their probable cause. To address these challenges, we conduct a continuing design study in collaboration with high performance computing researchers. We develop diverse and hierarchical ways to navigate and represent execution trace data in support of their trace analysis tasks. Through an iterative design process, we developed Traveler, an integrated visualization platform for task parallel traces. Traveler provides multiple linked interfaces to help navigate trace data from multiple contexts. We evaluate the utility of Traveler through feedback from users and a case study, finding that integrating multiple modes of navigation in our design supported performance analysis tasks and led to the discovery of previously unknown behavior in a distributed array library.

关键词： software visualization parallel computing traces performance analysis event sequence visualization

来源：评论

学校读者我要写书评

暂无评论

Distributed Path Compression for Piecewise Linear Morse-Smale Segmentations and Connected Components 14

Distributed Path Compression for Piecewise Linear Morse-Smal...

引用

14th symposium on large data Analysis and visualization

作者： Will, Michael Lukasczyk, Jonas Tierny, Julien Garth, Christoph RPTU Kaiserslautern Landau Kaiserslautern Germany CNRS Paris France Sorbonne Univ Paris France

ISBN: (纸本)9798331516932;9798331516925

This paper describes the adaptation to a distributed computational setting of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression. Additionally, we extend the algorithm to efficiently compute connected components in distributed structured and unstructured grids, based either on the connectivity of the underlying mesh or a feature mask. Our implementation is seamlessly integrated with the distributed extension of the Topology ToolKit (TTK), ensuring robust performance and scalability. To demonstrate the practicality and efficiency of our algorithms, we conducted a series of scaling experiments on large-scale datasets, with sizes of up to 40963 vertices on up to 64 nodes and 768 cores.

关键词： Distributed algorithms Scientific visualization

来源：评论

学校读者我要写书评

暂无评论

Spreeze: High-Throughput parallel Reinforcement Learning Framework

引用

ieee TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2025年第2期36卷 282-292页

作者： Hou, Jing Chen, Guang Zhang, Ruiqi Li, Zhijun Gu, Shangding Jiang, Changjun Tongji Univ Sch Automot Studies Shanghai 201804 Peoples R China Tongji Univ Sch Comp Sci & Technol Shanghai 201804 Peoples R China Tongji Univ Sch Mech Engn Shanghai 201804 Peoples R China Tech Univ Munich Dept Informat D-85748 Munich Germany

The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop. In this article, we propose Spreeze, a lightweight parallel framework for RL that efficiently utilizes a single desktop hardware resource to approach the throughput limit. We asynchronously parallelize the experience sampling, network update, performance evaluation, and visualization operations, and employ multiple efficient data transmission techniques to transfer various types of data between processes. The framework can automatically adjust the parallelization hyperparameters based on the computing ability of the hardware device in order to perform efficient large-batch updates. Based on the characteristics of the "Actor-Critic" RL algorithm, our framework uses dual GPUs to independently update the network of actors and critics in order to further improve throughput. Simulation results show that our framework can achieve up to 15,000 Hz experience sampling and 370,000 Hz network update frame rate using only a personal desktop computer, which is an order of magnitude higher than other mainstream parallel RL frameworks, resulting in a 73% reduction of training time. Our work on fully utilizing the hardware resources of a single desktop computer is fundamental to enabling efficient large-scale distributed RL training.

关键词： Training graphics processing units parallel processing Throughput Performance evaluation Hardware Adaptation models Load modeling Random access memory Computational modeling Reinforcement learning (RL) framework asynchrony shared memory model parallelism

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：