Volume rendering methods have been extensively studied in recent years due to their effectiveness and expressiveness for unstructured grid data visualization. Although existing volume rendering methods have demonstrat...
详细信息
Volume rendering methods have been extensively studied in recent years due to their effectiveness and expressiveness for unstructured grid data visualization. Although existing volume rendering methods have demonstrated great success, we observe that these methods have extensive interpolation and integral operations, which may adversely affect their efficiency and further prevent them being applied in interactive visualization. To boost efficiency and achieve interactive rendering rates, we propose a novel multi-threaded parallel projection tetrahedral algorithm based on multi-core architecture. By analyzing the parallelism of volume rendering methods, we find that the visibility sorting and classification/decomposition of projection polygons are the most time-consuming parts. To reduce the execution time of these two parts, we design corresponding parallel methods. In this manner, our method can dramatically improve efficiency and further enable user interactions for progressive unstructured grid analysis. The visibility sorting part includes partial sorting and global sorting: In partial sorting, we partition disordered tetrahedral depth array and obtain several loosely coupled subarrays, and in global sorting, we sort each subarray with multi-threads technique. In the classification/decomposition of projection polygons part, we normalize tetrahedral projection to ensure arbitrary tetrahedral produces the same number of triangles, and then store the produced vertex data into vertex array with offset computation that ensures correct order for the multi-threads runtime. The experimental results show that the proposed multi-threaded projection tetrahedral algorithm can achieve a speedup of 3.4X on a 20 cores CPU and outperforms the fastest VTK implementation at a speedup of 2.5X, which verifies the efficiency of our algorithm. Graphic
State-of-the-art chip multiprocessor (CMP) proposals emphasize general optimizations designed to deliver computing power for many types of applications. Potentially, significant performance improvements that leverage ...
详细信息
State-of-the-art chip multiprocessor (CMP) proposals emphasize general optimizations designed to deliver computing power for many types of applications. Potentially, significant performance improvements that leverage application-specific characteristics such as data access behavior are missed by this approach. In this paper, we demonstrate how scalable and high-performance parallel systems can be built by classifying data accesses into different categories and treating them differently. We develop a novel compiler-based approach to speculatively detect a data classification termed practically private, which we demonstrate is ubiquitous in a wide range of parallel applications. Leveraging this classification provides efficient solutions to mitigate data access latency and coherence overhead in today's many-core architectures. While the proposed data classification scheme can be applied to many micro-architectural constructs including the TLB, coherence directory, and interconnect, we demonstrate its potential through an efficient cache coherence design. Specifically, we show that the compiler-assisted mechanism reduces an average of 46% coherence traffic and achieves up to 12%, 8%, and 5% performance improvement over shared, private, and state-of-the-art NUCA-based caching, respectively, depending on scenarios.
At the conceptual design stage, simplified finite element (FE) model of body-in-white (BIW) structure focuses on its specific merit to provide early-stage predictions for detailed FE model of that. This paper exploits...
详细信息
At the conceptual design stage, simplified finite element (FE) model of body-in-white (BIW) structure focuses on its specific merit to provide early-stage predictions for detailed FE model of that. This paper exploits a semi-rigid beam element (SRBE) that consists of a beam element with two semi-rigid connections at the ends to simulate the flexibility of joint. Guyan reduction method condenses the SRBE as a super element. A special finite element software for structural modeling and analysis of BIW (SMAB) is developed in .NET framework. The Unified Modeling Language is employed to depict the classes and their relationship. The design patterns are identified and applied in the framework design to facilitate communication and system expansion. Microsoft DirectX and GDI+ implement graphics display of spatial BIW frame and planar thin-walled cross section. Based on multi-threaded technology in .NET, subspace iteration method is paralleled to speed up the mode analysis. As a result, the efficiency of the SRBE is demonstrated by a benchmarking automotive body. multi-threaded parallel is effective and useful, especially for frequency optimization. (C) 2011 Elsevier Ltd. All rights reserved.
暂无评论