the 3D triangle mesh is the dominant representation used in the parallel rendering of 3D geometric models. However, the explosive growth in the complexity of the mesh-based 3D models overwhelms the communication bandw...
详细信息
the 3D triangle mesh is the dominant representation used in the parallel rendering of 3D geometric models. However, the explosive growth in the complexity of the mesh-based 3D models overwhelms the communication bandwidth of existing parallel rendering systems. An effective solution to this problem is to use a compressed mesh representation. In recent years, researchers have shown a great deal of interest in developing highly efficient mesh compression algorithms. However, using a compressed mesh in a parallel rendering architecture to achieve the highest possible end-to-end performance is a largely unexplored area. We have previously (1998, 2000) developed an efficient mesh compression/decompression algorithm, called "breadth-first traversal" (BFT). In this paper, we design and implement a parallel rendering architecturethat can use a BFT mesh representation. the enabling technology is a novel algorithm that can perform a compression-domain subdivision of the BFT mesh for bandwidth-efficient distribution of submeshes to parallel processors. Parallel rendering using a BFT mesh reduces the communication requirement to about one third of that of uncompressed representation.
the convergence of iterative methods used to solve the linear systems arising in incompressible flow problems is sensitive to flow parameters such as the Reynolds number;time step and the mesh width. this paper presen...
详细信息
Multicast communication involves transmitting information from a single source to multiple destinations, and is a requirement in high-performance networks. Current trends in networking applications indicate an increas...
详细信息
Scheduling computational tasks on processors is a key issue for high-performancecomputing. Although a large number of scheduling heuristics have been presented in the literature, most of them target only homogeneous ...
详细信息
Scheduling computational tasks on processors is a key issue for high-performancecomputing. Although a large number of scheduling heuristics have been presented in the literature, most of them target only homogeneous resources. Moreover, these heuristics often rely on a model where the number of processors is bounded but where the communication capabilities of the target architecture are not restricted. In this paper, we deal with a more realistic model for heterogeneous networks of workstations, where each processor can send and/or receive at most one message at any given time-step. First, we state a complexity result that shows that the model is at least as difficult as the standard one. then, we show how to modify classical list scheduling techniques to cope withthe new model. Next we introduce a new scheduling heuristic which incorporates load-balancing criteria into the decision process of scheduling and mapping ready tasks. Experimental results conducted using six classical testbeds (LAPLACE, LU, STENCIL, FORK-JOIN, DOOLITTLE, and LDMt) show very promising results.
In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. the method combines elements of numerical ...
详细信息
In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. the method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. the algorithm is derived such that all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. therefore, the cost of global communication can be significantly reduced. In this paper, the bulk synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. this performance model provides us useful insight in the time complexity of the method using only a few system dependent parameters based on a simple and accurate cost modelling. the theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.
In modern resource management systems for supercomputers and HPC-clusters the job-scheduler plays a major role in improving the performance and usability of the system. the performance of the used scheduling policies ...
详细信息
the increasing gap between processor and memory performance has led to new architectural models for memory-intensive applications. In this paper, we use a set of memory-intensive benchmarks to evaluate a mixed logic a...
详细信息
the increasing gap between processor and memory performance has led to new architectural models for memory-intensive applications. In this paper, we use a set of memory-intensive benchmarks to evaluate a mixed logic and DRAM processor called VIRAM as a building block for scientific computing. For each benchmark, we explore the fundamental hardware requirements of the problem as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. Results indicate that VIRAM is significantly faster than conventional cache-based machines for problems that are truly limited by the memory system and that it has a significant power advantage across all the benchmarks.
this paper addresses the parallelization of loops with irregular assignment computations on cc-NUMA multiprocessors. this loop pattern is distinguished by the existence of loop-carried output data dependences that can...
详细信息
Our goal is to apply mobile agent technology to provide a better scheduling for MPI applications executing in a cluster configuration. this approach could represent in a distributed cluster environment an enhancement ...
详细信息
Our goal is to apply mobile agent technology to provide a better scheduling for MPI applications executing in a cluster configuration. this approach could represent in a distributed cluster environment an enhancement on the load balancing of the parallel processes. MPI in a cluster of heterogeneous machines could lead parallel programmers to obtain frustrated results, mainly because of the lack of an even distribution of the workload in the cluster. As a result, before submitting a MPI application to a cluster, we use our JOTA mobile agent approach to acquire a more precise information of machine's workload. therefore, with a more precise knowledge of the load and characteristics in each machine, we are ready to gather lightweight workstations to form a cluster. Our empirical results indicate that it is possible to spend less elapsed time when considering the execution of a parallel application using the agent approach in comparison to an ordinary MPI environment.
暂无评论