Collaborative exploration of scientific data sets across large high-resolution displays requires both high visual detail as well as low-latency transfer of image data (oftentimes inducing the need to trade one for the...
详细信息
Collaborative exploration of scientific data sets across large high-resolution displays requires both high visual detail as well as low-latency transfer of image data (oftentimes inducing the need to trade one for the other). In this work, we present a system that dynamically adapts the encoding quality in such systems in a way that reduces the required bandwidth without impacting the details perceived by one or more observers. Humans perceive sharp, colourful details, in the small foveal region around the centre of the field of view, while information in the periphery is perceived blurred and colourless. We account for this by tracking the gaze of observers, and respectively adapting the quality parameter of each macroblock used by the H.264 encoder, considering the so-called visual acuity fall-off. This allows to substantially reduce the required bandwidth with barely noticeable changes in visual quality, which is crucial for collaborative analysis across display walls at different locations. We demonstrate the reduced overall required bandwidth and the high quality inside the foveated regions using particle rendering and parallel coordinates.
Both visual detail and a low-latency transfer of image data are required for collaborative exploration of scientific data sets across large high-resolution displays. In this work, we present an approach that reduces t...
详细信息
ISBN:
(纸本)9781665432832
Both visual detail and a low-latency transfer of image data are required for collaborative exploration of scientific data sets across large high-resolution displays. In this work, we present an approach that reduces the resolution before the encoding and uses temporal upscaling to reconstruct the full resolution image, reducing the overall latency and the required bandwidth without significantly impacting the details perceived by observers. Our approach takes advantage of the fact that humans do not perceive the full details of moving objects by providing a perfect reconstruction for static parts of the image, while non-static parts are reconstructed with a lower quality. This strategy enables a substantial reduction of the encoding latency and the required bandwidth with barely noticeable changes in visual quality, which is crucial for collaborative analysis across display walls at different locations. Additionally, our approach can be combined with other techniques aiming to reduce the required bandwidth while keeping the quality as high as possible, such as foveated encoding. We demonstrate the reduced overall latency, the required bandwidth, as well as the high image quality using different visualisations.
In this paper, we propose a Hyper-Dimensional genome analysis platform. Instead of working with original sequences, our method maps the genome sequences into high-dimensional space and performs sequence matching with ...
详细信息
ISBN:
(纸本)9781665483322
In this paper, we propose a Hyper-Dimensional genome analysis platform. Instead of working with original sequences, our method maps the genome sequences into high-dimensional space and performs sequence matching with simple and parallel similarity searches. At the algorithm level, we revisit the sequence searching with brain-like memorization that Hyper-Dimensional computing natively supports. Instead of working on the original data, we map all data points into high-dimensional space, enabling the main sequence searching operations to process in a hardware-friendly way. We accordingly design a density-aware FPGA implementation. Our solution searches the similarity of an encoded query and large-scale genome library through different chunks. We exploit the holographic representation of patterns to stop search operations on libraries with a lower chance of a match. This translates our computation from dense to highly sparse just after a few chuck-based searches. Our large-scale evaluation shows that our accelerator can provide 46x speedup and 188x energy efficiency improvement compared to a state-of-the-art Hyper-Dimensional computing GPU implementation.
The rise of heterogeneous resources in modern High Performance Computing (HPC) systems has propelled the scientific community beyond the exascale threshold. To maximize simulation performance on HPCs, applications inc...
详细信息
ISBN:
(数字)9798331516925
ISBN:
(纸本)9798331516932
The rise of heterogeneous resources in modern High Performance Computing (HPC) systems has propelled the scientific community beyond the exascale threshold. To maximize simulation performance on HPCs, applications increasingly rely on device resources, such as GPUs, leading to under-utilization of host resources, partic-ularly CPUs. In situ analysis and visualization techniques minimize data movement by operating on data in-memory, but this still in-volves blocking operations that incur a small penalty on simulation performance. We explore a novel instrumentation approach where GPU-based time step data is copied from device memory to host memory, enabling CPUs to concurrently perform visualization and analysis tasks. This strategy allows simulations to continue uninter-rupted by an in situ library's analysis and visualization processes.
Functional approximation as a high-order continuous representation provides a more accurate value and gradient query compared to the traditional discrete volume representation. Volume visualization directly rendered f...
详细信息
ISBN:
(数字)9798331516925
ISBN:
(纸本)9798331516932
Functional approximation as a high-order continuous representation provides a more accurate value and gradient query compared to the traditional discrete volume representation. Volume visualization directly rendered from functional approximation generates high-quality rendering results without high-order artifacts caused by trilinear interpolations. However, querying an encoded functional approximation is computationally expensive, especially when the input dataset is large, making functional approximation impractical for interactive visualization. In this paper, we proposed a novel functional approximation multi-resolution representation, Adaptive-FAM, which is lightweight and fast to query. We also design a GPU-accelerated out-of-core multi-resolution volume visualization framework that directly utilizes the Adaptive-FAM representation to generate high-quality rendering with interactive responsiveness. Our method can not only dramatically decrease the caching time, one of the main contributors to input latency, but also effectively improve the cache hit rate through prefetching. Our approach significantly outperforms the traditional function approximation method in terms of input latency while maintaining comparable rendering quality.
While k-d trees are known to be effective for spatial indexing of sparse 3-d volume data, full reconstruction, e.g. due to changes to the alpha transfer function during rendering, is usually a costly operation with th...
详细信息
While k-d trees are known to be effective for spatial indexing of sparse 3-d volume data, full reconstruction, e.g. due to changes to the alpha transfer function during rendering, is usually a costly operation with this hierarchical data structure. In a recent publication we showed how to port a clever state of the art k-d tree construction algorithm to a multi-core CPU architecture and by means of thorough optimization we were able to obtain interactive reconstruction rates for moderately sized to largedata sets. The construction scheme is based on maintaining partial summed-volume tables that fit in the L1 cache of the multi-core CPU and that allow for fast occupancy queries. In this work we propose a GPU implementation of the parallel k-d tree construction algorithm and compare it with the original multi-core CPU implementation. We conduct a thorough comparative study that outlines performance and scalability of our implementation.
We present a parallel, distributed-memory technique that enhances traditional ray-casting volume rendering of largedata sets to highlight the depth and perception of interesting volumetric features. The technique int...
We present a parallel, distributed-memory technique that enhances traditional ray-casting volume rendering of largedata sets to highlight the depth and perception of interesting volumetric features. The technique introduces a lighting system that accounts for global shadows across distributed MPI nodes while using shared-memory parallelism within each node to compute shading information efficiently. The first stage of the approach involves estimating energy attenuation from a point light source through the global volume, using a reduced spatial resolution representation of the volume, with minimal global communication between nodes. It is then used in the second stage during volume rendering to shade sample points captured during ray-casting, generating a high-quality image. In this work, we study the technique's performance across varying spatial resolutions of the estimated light attenuation using synthetic and real-world volumetric data sets on distributed systems.
Subgraph enumeration is an important problem in the field of Graph Analytics with numerous applications. The problem is provably NP-complete and requires sophisticated heuristics and highly efficient implementations t...
详细信息
ISBN:
(纸本)9781665481069
Subgraph enumeration is an important problem in the field of Graph Analytics with numerous applications. The problem is provably NP-complete and requires sophisticated heuristics and highly efficient implementations to be feasible on problem sizes of realistic scales. parallel solutions have shown a lot of promise on CPUs and distributed environments. Recently, GPU-based parallel solutions have also been proposed to take advantage of the massive execution resources in modern GPUs. Subgraph enumeration involves traversing a search tree for each vertex of the data graph to find matches of a query in a graph. Most GPU-based solutions traverse the tree in breadth-first manner that exploits parallelism at the cost of high memory requirement and presents a formidable challenge for processing large graphs with high-degree vertices since the memory capacity of GPUs is significantly lower than that of CPUs. In this work, we propose a novel GPU solution based on a hybrid BFS and DFS approach where the top level(s) of the search trees are traversed in a fully parallel, breadth-first manner while each subtree is traversed in a more space-efficient, depth-first manner. The depth-first traversal of subtrees requires less memory but presents more challenges for parallel execution. To overcome the less parallel nature of depth-first traversal, we exploit fine-grained parallelism in each step of the depth-first traversal of subtrees. We further identify and implement various optimizations to efficiently utilize memory and compute resources of the GPUs. We evaluate our performance in comparison with the state-of-the-art GPU and CPU implementations. We outperform the GPU and CPU implementations with a geometric mean speedup of 9.47x (up to 92.01x) and 2.37x (up to 12.70x), respectively. We also show that the proposed approach can efficiently process the graphs that previously cannot be processed by the state-of-the-art GPU solutions due to their excessive memory requirement.
large-scale simulations on nonuniform particle distributions that evolve over time are widely used in cosmology, molecular dynamics, and engineering. Such data are often saved in an unstructured format that neither pr...
详细信息
ISBN:
(纸本)9781665440660
large-scale simulations on nonuniform particle distributions that evolve over time are widely used in cosmology, molecular dynamics, and engineering. Such data are often saved in an unstructured format that neither preserves spatial locality nor provides metadata for accelerating spatial or attribute subset queries, leading to poor performance of visualization tasks. Furthermore, the parallel I/O strategy used typically writes a file per process or a single shared file, neither of which is portable or scalable across different HPC systems. We present a portable technique for scalable, spatially aware adaptive aggregation that preserves spatial locality in the output. We evaluate our approach on two supercomputers, Stampede2 and Summit, and demonstrate that it outperforms prior approaches at scale, achieving up to 2.5v faster writes and reads for nonuniform distributions. Furthermore, the layout written by our method is directly suitable for visual analytics, supporting low-latency reads and attribute-based filtering with little overhead.
Homomorphic encryption (HE) algorithms, particularly the Cheon-Kim-Kim-Song (CKKS) scheme, offer significant potential for secure computation on encrypted data, making them valuable for privacy-preserving machine lear...
详细信息
ISBN:
(数字)9798350387179
ISBN:
(纸本)9798350387186
Homomorphic encryption (HE) algorithms, particularly the Cheon-Kim-Kim-Song (CKKS) scheme, offer significant potential for secure computation on encrypted data, making them valuable for privacy-preserving machine learning. However, high latency in large integer operations in the CKKS algorithm hinders the processing of largedatasets and complex computations. This paper proposes a novel strategy that combines lossless data compression techniques with the parallel processing power of graphics processing units to address these challenges. Our approach demonstrably reduces data size by 90% and achieves significant speedups of up to 100 times compared to conventional approaches. This method ensures data confidentiality while mitigating performance bottlenecks in CKKS-based computations, paving the way for more efficient and scalable HE applications.
暂无评论