The proceedings contain 12 papers. The topics discussed include: parallel lumigraph reconstruction;parallelvisualization of large-scale aerodynamics calculations: a case study on the Cray T3E;hybrid scheduling for pa...
ISBN:
(纸本)1581132379
The proceedings contain 12 papers. The topics discussed include: parallel lumigraph reconstruction;parallelvisualization of large-scale aerodynamics calculations: a case study on the Cray T3E;hybrid scheduling for parallel rendering using coherent ray tasks;exploiting frame coherence with the temporal depth buffer in a distributed computing environment;transparent distributed processing for rendering;web based collaborative visualization of distributed and parallel simulation;scalable distributed visualization using off-the-shelf components;and interactive volume segmentation with the PAVLOV Architecture.
Several tools were introduced by the Versatile Video Coding (VVC) standard to enhance compression, with the Adaptive Loop Filter (ALF) being one such tool that significantly enhances visual quality. Although it provid...
详细信息
ISBN:
(数字)9798331522124
ISBN:
(纸本)9798331522131
Several tools were introduced by the Versatile Video Coding (VVC) standard to enhance compression, with the Adaptive Loop Filter (ALF) being one such tool that significantly enhances visual quality. Although it provides coding efficiency gains, the ALF also poses a substantial computational burden. To address this issue, this paper evaluates the processing time of the classification step in the ALF process of VVC encoders considering different programming paradigms. A sequential CPU implementation, a Single Instruction Multiple data implementation, and a customized parallel implementation using CUDA to be executed in GPUs. The results showed that SIMD-optimized implementation significantly outperforms the fully-scalar implementation. Although the GPU paradigm is faster than fully-scalar, it remains slower than SIMD-optimized due to CPU-GPU communication overhead. With more tasks, the GPU could potentially surpass the SIMD-optimized processing time.
Matrix transposition is a classic operation in machine learning and scientific applications. HBM-enabled FPGAs, with their high-bandwidth capabilities, are increasingly deployed in the data centers. However, achieving...
详细信息
ISBN:
(数字)9798331502812
ISBN:
(纸本)9798331502829
Matrix transposition is a classic operation in machine learning and scientific applications. HBM-enabled FPGAs, with their high-bandwidth capabilities, are increasingly deployed in the data centers. However, achieving high bandwidth utilization on HBM is challenging due to the large strided access patterns in matrix transposition, which significantly degrade bandwidth utilization. Additionally, saturating HBM bandwidth requires a large number of parallel accesses, further complicating the design. In this paper, we present a high throughput matrix transposition design for HBM-enabled FPGAs. Our design performs small strided accesses to HBM, ensuring optimized bandwidth utilization. We use on-chip SRAMs to reorganize data from HBM into the access pattern needed for matrix transposition. Inspired by Latin Squares, we propose a novel data layout for storing the matrix tiles in SRAM. This data layout is paired with a customized scheduling strategy to eliminate SRAM bank conflicts. We develop a fully pipelined architecture with multiple Processing Elements (PEs) to enable parallel HBM accesses. Our design is highly scalable, supporting various configurations of HBM channels and arbitrary matrix dimensions. We implement the proposed design on the AMD Alveo U280 FPGA. Experimental results show that our design achieves a matrix transposition throughput of up to 415 GB/s, more than 90% of the peak HBM bandwidth of the target FPGA platform. Our design outperforms state-of-the-art GPU implementations, delivering up to 1.44 × higher HBM memory bandwidth utilization.
The Segment Anything Model (SAM) is a large general segmentation model proposed by Meta, which has shown amazing performance in many natural image segmentation tasks. Recently, MA-SAM has been proposed to transplant S...
详细信息
ISBN:
(数字)9798331520526
ISBN:
(纸本)9798331520533
The Segment Anything Model (SAM) is a large general segmentation model proposed by Meta, which has shown amazing performance in many natural image segmentation tasks. Recently, MA-SAM has been proposed to transplant SAM model to medical applications. However, such models still has shortcomings in utilizing the three-dimensional information of medical images. In this paper, we proposed a new structure to fully utilizing 3D information on medical images to enhance the performance of SAM, called VSS-SAM. Particularly, the proposed method applied two branches: 1) SAM as an encoder to learn the basic topology relations among slices, and 2) parallel Mamba as second branch to effectively capture long-range spatial dependencies. Finally, a new decoder was proposed to integrate the multi-view feature representations extracted from the two branches and output the final prediction, aiming to achieve optimal segmentation performance. We validated our method on three publicly available datasets. Experimental results show that the segmentation performance of VSS-SAM is significantly better than that of existing methods in multiple data sets.
Discusses solutions to the problem of large-scale datavisualization. data management and reduction; Scalable parallelvisualization; High-resolution displays; User interfaces.
Discusses solutions to the problem of large-scale datavisualization. data management and reduction; Scalable parallelvisualization; High-resolution displays; User interfaces.
We describe two highly scalable, parallel software volume-rendering algorithms-one renders unstructured grid volume data and the other renders isosurfaces.
We describe two highly scalable, parallel software volume-rendering algorithms-one renders unstructured grid volume data and the other renders isosurfaces.
parallel coordinates is a popular and well-known multivariate datavisualization technique. However, one of their inherent limitations has to do with the rendering of very largedata sets. This often causes an overplo...
详细信息
parallel coordinates is a popular and well-known multivariate datavisualization technique. However, one of their inherent limitations has to do with the rendering of very largedata sets. This often causes an overplotting problem and the goal of the visual information seeking mantra is hampered because of a cluttered overview and non-interactive update rates. In this paper, we propose two novel solutions, namely, angular histograms and attribute curves. These techniques are frequency-based approaches to large, high-dimensional datavisualization. They are able to convey both the density of underlying polylines and their slopes. Angular histogram and attribute curves offer an intuitive way for the user to explore the clustering, linear correlations and outliers in largedata sets without the over-plotting and clutter problems associated with traditional parallel coordinates. We demonstrate the results on a wide variety of data sets including real-world, high-dimensional biological data. Finally, we compare our methods with the other popular frequency-based algorithms.
We present an architectural approach based on paralleldata streaming to enable visualizations on a parallel cluster. Our approach requires less memory than other visualizations while achieving high code reuse.
We present an architectural approach based on paralleldata streaming to enable visualizations on a parallel cluster. Our approach requires less memory than other visualizations while achieving high code reuse.
In this case study we present an open-source visualization application with a data-parallel novel application architecture. The architecture is unique because is uses the TH scripting language to synchronize the user ...
ISBN:
(纸本)0780372239
In this case study we present an open-source visualization application with a data-parallel novel application architecture. The architecture is unique because is uses the TH scripting language to synchronize the user interface with the VTK parallelvisualization pipeline and parallel-rendering module. The resulting application shows scalable performance, and is easily extendable because of its simple modular architecture. We demonstrate the application with a 9.8 gigabyte structured-grid ocean model.
暂无评论