In linear algebra, Cholesky factorization is useful in solving a system of equations with a symmetric positive definite coefficient matrix. Cholesky factorization is roughly twice as fast relative to LU factorization ...
详细信息
ISBN:
(纸本)9781467345651;9780769549033
In linear algebra, Cholesky factorization is useful in solving a system of equations with a symmetric positive definite coefficient matrix. Cholesky factorization is roughly twice as fast relative to LU factorization which applies to general matrices. In recent years, with advances in technology, a Fermi GPU card can accommodate hundreds of cores compared to the small number of 8 or 16 cores on CPU. Therefore a trend is seen to use the graphics card as a general purpose graphics processing unit (GPGPU) for parallel computation. In this work, Volkov's hybrid implementation of Cholesky factorization is evaluated on the new Fermi GPU with others and then some improvement strategies were proposed. After experiments, compared to the CPU version using Intel Math Kernel Library (MKL), our proposed GPU improvement strategy can achieve a speedup of 3.85x on Cholesky factorization of a square matrix of dimension 10,000.
Visualization is one of the most important applications of computer graphics. To have a parallel infrastructure for visualization, some technologies would be needed. We identify the state-of-the-art technologies that ...
详细信息
ISBN:
(纸本)9780769533599
Visualization is one of the most important applications of computer graphics. To have a parallel infrastructure for visualization, some technologies would be needed. We identify the state-of-the-art technologies that have prepared for building such an infrastructure and examine a collection of applications that would benefit from it. We consider a broad range of scientific and technological advances in visualization, which are relevant to visual supercomputing. Mainly, we present the original abstracts from the cited papers.
Many parallel application areas that exploit massive parallelism, such as climate modeling, require massive storage systems for the archival and retrieval of data sets. As such, advances in massively parallel computat...
详细信息
Many parallel application areas that exploit massive parallelism, such as climate modeling, require massive storage systems for the archival and retrieval of data sets. As such, advances in massively parallel computation must be coupled with advances in mass storage technology in order to satisfy I/O constraints of these applications. We demonstrate the effects of such I/O-computation disparity for a representative distributed information system, NASA's Earth Observing System distributed Information System (EOSDIS). We use performance modeling to identify bottlenecks in EOSDIS for two representative user scenarios from climate change research.
There has recently been an interest in the introduction of reconfigurable buses to existing parallel architectures. Among them Reconfigurable Mesh (RM) draws much attention because of its simplicity. This paper presen...
详细信息
There has recently been an interest in the introduction of reconfigurable buses to existing parallel architectures. Among them Reconfigurable Mesh (RM) draws much attention because of its simplicity. This paper presents two O(1) time algorithms to compute the contour of the maximal elements of N planar points on the RM. The first algorithm employs an RM of size N×N while the second one uses a 3-D RM of size √N×√N×√N.
The unparalleled growth in the areas of microprocessor development, transmission media and computing models has given rise to a totally new environment in computing technology. The evolution of computing models, combi...
详细信息
ISBN:
(纸本)1892512416
The unparalleled growth in the areas of microprocessor development, transmission media and computing models has given rise to a totally new environment in computing technology. The evolution of computing models, combined with the advances of the enabling technologies, gave way to the client/server perspective of computing. The origin and economics of the client/server model are depicted within the context of this paper. A general overview of client/server architecture is also included to provide a comfort level with the concept itself.
Wireless technology provides the service of being connected, while on move. As computer and wireless communication technology advances, handheld devices are entering in the arena of mobile computing. It can be seen th...
详细信息
ISBN:
(纸本)1892512459
Wireless technology provides the service of being connected, while on move. As computer and wireless communication technology advances, handheld devices are entering in the arena of mobile computing. It can be seen that small wireless devices like PDA (Personal Digital Assistant) will dominate the mobile computing environment in future world However, because of lack of resources and high mobility, PDAs need integrated fault-tolerant mechanisms to ensure their computational progress. In this paper, we present our classification of faults, experienced by mobile devices and then we present our architectural framework to provide availability for applications running on mobile devices. Our approach takes into account the nature of application and provides suitable fault tolerance mechanism as per the requirements of every application.
The present advances in parallel and distributed processing and its application to database operations such as join resulted in investigating parallel algorithms. Hash based join algorithms involve a costly data parti...
详细信息
The present advances in parallel and distributed processing and its application to database operations such as join resulted in investigating parallel algorithms. Hash based join algorithms involve a costly data partitioning phase prior to the join operation. This paper presents new parallel join algorithms for relations based on grid files where no costly partitioning phase is involved, hence the performance can improve.
We present Fast Trans - a parallel, distributed-memory simulator for transportation networks that uses a queue-based event-driven approach to traffic microsimulation. Queue-based simulation models have been shown to b...
详细信息
ISBN:
(纸本)9781424457700
We present Fast Trans - a parallel, distributed-memory simulator for transportation networks that uses a queue-based event-driven approach to traffic microsimulation. Queue-based simulation models have been shown to be significantly faster than cellular-automata type approaches, sacrificing spatial granularity for speed, while preserving link and intersection dynamics with high fidelity. Significant advances over previous work include the size of the simulated network, support for dynamic responses to congestion and the absence of precomputed routes - all routing calculations are executed online. We present initial results from a scalability study using a real-world network from the North-East region of the United States comprising over 1.5 million network elements and over 25 million vehicular trips. Simulation of an entire day's worth of realistic vehicular itineraries involving approximately five billion simulated events executes in less than an hour of wall-clock time on a distributedcomputing cluster. Initial results suggest almost linear speed-ups with cluster size.
With advances in processor and networking technologies, current distributed-memory machines can achieve hundreds of Giga Floating-Point Operations Per Second (GFLOPS) of performance. By using such machines, many appli...
详细信息
With advances in processor and networking technologies, current distributed-memory machines can achieve hundreds of Giga Floating-Point Operations Per Second (GFLOPS) of performance. By using such machines, many application problems having regularly structured computations have been successfully parallelized using the explicit message passing paradigm. However, it is difficult to parallelize vision problems having irregularly structured computations. parallel solutions to these problems are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. It is therefore necessary to develop efficient algorithmic techniques to achieve large speed-ups. In this paper, we propose an algorithmic framework to design efficient and portable parallel algorithms for irregular vision problems on distributed-memory machines. Based on this algorithmic framework, we develop techniques for task scheduling, load balancing, and overlapping communication with computation.
Estimating communication cost involved in executing a program on distributed memory machines is important for evaluating the overheads due to repartitioning. We present a scheme which will work with reasonable efficie...
详细信息
ISBN:
(纸本)0818680679
Estimating communication cost involved in executing a program on distributed memory machines is important for evaluating the overheads due to repartitioning. We present a scheme which will work with reasonable efficiency for arrays with at most 3 dimensions. Hyperplane Partitioning technique given by [10] is extended to complete programs by estimating the communication cost by the scheme presented in this work.
暂无评论