Commodity workstations with multiple GPUs have been built by engineers and scientists for real-time rendering applications. As a result, a high display resolution can be achieved by connecting each GPU to a display mo...
详细信息
Acceleration structures are key to high performance parallel ray tracing. Maximizing performance requires configuring the degrees of freedom (e.g., construction parameters) these data structures expose. Whether a para...
详细信息
In this paper, we introduce a modified Morse potential as an alternative to the existing spring models within a massively parallel extended Position Based Dynamics (XPBD) algorithm. To date, stretching is one of the m...
详细信息
This paper presents a 3D widget user interface (UI), super-ellipsoid shape primitives and a customized volume rendering algorithm that together create an effective system for generating contextual views in 3D medical ...
详细信息
ISBN:
(纸本)9783030337230;9783030337223
This paper presents a 3D widget user interface (UI), super-ellipsoid shape primitives and a customized volume rendering algorithm that together create an effective system for generating contextual views in 3D medical images. The widget UI supports the fast and precise positioning of a super-ellipsoid "paint blob". The paint blob can be deposited and automatically blended with previously deposited blobs to form an arbitrarily complex-shaped region of interest (ROI) enclosing target image features. The rendering of these "focus" regions can be controlled separately from the surrounding contextual region, allowing medical experts to examine and measure image features relative to the surrounding structures, regardless of the level of occlusion. The system's core algorithms execute in parallel on graphics processing units, resulting in real-time interaction and high-quality visualizations. The focus plus context visualization system is validated via a user study and a series of experiments.
Classic synchronization problems are often used to introduce students to the subtleties of concurrency and synchronization mechanisms, such as semaphores, monitors, locks, and condition variables. The Dining Philosoph...
详细信息
ISBN:
(纸本)9781450358903
Classic synchronization problems are often used to introduce students to the subtleties of concurrency and synchronization mechanisms, such as semaphores, monitors, locks, and condition variables. The Dining Philosophers, Producers-Consumers, and Readers-Writers are all classic problems in which a correct solution requires the actions of multiple processes or threads to be synchronized. In this paper, we present visualizations for these three problems and describe their use as pedagogical tools to help students build accurate mental models of concurrency abstractions such as starvation, deadlock, livelock, and correct execution. We also present the results of an experiment that indicate students find using these visualizations to be significantly more engaging than reading a textbook, with no significant difference in learning. We do not claim that our visualizations should replace a course text;rather we present them as engaging pedagogical tools to complement the textbook in courses on Operating Systems, Programming Languages, and other courses where concurrency and synchronization are covered.
We present a novel partial reduction algorithm to aggregate sparsely distributed intermediate results that are generated by data-parallel analysis and visualization algorithms. Applications of partial reduction includ...
详细信息
ISBN:
(纸本)9781538668733
We present a novel partial reduction algorithm to aggregate sparsely distributed intermediate results that are generated by data-parallel analysis and visualization algorithms. Applications of partial reduction include flow trajectory analysis, big data online analytical processing, and volume rendering. Unlike traditional full parallel reduction that exchanges dense data across all processes, the purpose of partial reduction is to exchange only intermediate results that correspond to the same query, such as line segments of the same flow trajectory. To this end, we design a three-stage algorithm that minimizes the communication cost: (1) partitioning the result space into groups;(2) constructing and optimizing the reduction partners for each group;and (3) initiating collective reduction operations for all groups concurrently. Both theoretical and empirical analyses show that our algorithm outperforms the traditional methods when the intermediate results are sparsely distributed. We also demonstrate the effectiveness of our algorithm for flow visualization, big log data analysis, and volume rendering.
Real-Time Image Processing and Computer Vision systems are now in the mainstream of technologies enabling applications for Cyber-Physical Systems, Internet of Things, Augmented Reality, and Industry 4.0. These applica...
详细信息
ISBN:
(数字)9781728182865
ISBN:
(纸本)9781728182872
Real-Time Image Processing and Computer Vision systems are now in the mainstream of technologies enabling applications for Cyber-Physical Systems, Internet of Things, Augmented Reality, and Industry 4.0. These applications bring the need for Smart Camera for local real-time processing of images and videos. However, the massive amount of data to be processed within short deadlines cannot be handled by most commercial cameras. In this work, we show the design and implementation of a many-core vision processor architecture to be used in Smart Cameras. With massive parallelism exploration and application-specific characteristics, our architecture is composed of distributed Processing Elements and Memories connected through a Network-on-Chip. The architecture was implemented as an FPGA overlay, focusing on optimized hardware utilization. The parameterized architecture was characterized by its hardware occupation, maximum operating frequency, and processing frame rate. Different configurations ranging from one to four hundred Processing Elements were implemented and compared to several works from the literature. The results show that the proposed architecture successfully allies programmability and performance, being a suitable alternative for future Smart Cameras.
This paper makes a case for using low-power embedded GPUs for the purpose of executing high-performance scientific visualization tasks. We compare the greenness (i.e., power, energy, and energy-delay product -> EDP...
详细信息
ISBN:
(纸本)9781538655559
This paper makes a case for using low-power embedded GPUs for the purpose of executing high-performance scientific visualization tasks. We compare the greenness (i.e., power, energy, and energy-delay product -> EDP) of an embedded GPU with a CPU for commonly encountered visualization tasks using two real-world applications: (1) Modeling for Prediction Across Scale Ocean (MPAS-O) and (2) Particular Ensembles (PE). Our preliminary results show that the low-power embedded GPU is capable of handling complex visualization tasks while consuming less than 50% of the energy consumed by a CPU server. In addition, we find that the embedded GPU outperforms the CPU with dynamic voltage-frequency scaling (DVFS) enabled in a majority of the cases.
A key component of most large-scale rendering systems is a parallel image compositing algorithm, and the most commonly used compositing algorithms are binary swap and its variants. Although shown to be very efficient,...
详细信息
ISBN:
(纸本)9781538668733
A key component of most large-scale rendering systems is a parallel image compositing algorithm, and the most commonly used compositing algorithms are binary swap and its variants. Although shown to be very efficient, one of the classic limitations of binary swap is that it only works on a number of processes that is a perfect power of 2. Multiple variations of binary swap have been independently introduced to overcome this limitation and handle process counts that have factors that are not 2. To date, few of these approaches have been directly compared against each other, making it unclear which approach is best. This paper presents a fresh implementation of each of these methods using a common software framework to make them directly comparable. These methods to run binary swap with odd factors are directly compared. The results show that some simple compositing approaches work as well or better than more complex algorithms that are more difficult to implement.
As computer simulations progress to increasingly complex, non-linear, and three-dimensional systems and phenomena, intuitive and immediate visualization of their results is becoming crucial. While Virtual Reality (VR)...
详细信息
As computer simulations progress to increasingly complex, non-linear, and three-dimensional systems and phenomena, intuitive and immediate visualization of their results is becoming crucial. While Virtual Reality (VR) and Natural User Interfaces (NUIs) have been shown to improve understanding of complex 3D data, their application to live in situ visualization and computational steering is hampered by performance requirements. Here, we present the design of a software framework for interactive VR in situ visualization of parallel numerical simulations, as well as a working prototype implementation. Our design is targeted towards meeting the performance requirements for VR, and our work is packaged in a framework that allows for easy instrumentation of simulations. Our preliminary results inform about the technical feasibility of the architecture, as well as the challenges that remain.
暂无评论