Discretization of image restoration problems often leads to a discrete inverse ill-posed problem: the discretized operator is so badly conditioned that it can be actually considered as undetermined. In this case one s...
详细信息
Various researchers have realized the value of implementing loop fusion to evaluate dense (pointwise) array expressions. Recently, the method of template metaprogramming in C++ has been used to significantly speed-up ...
详细信息
Various researchers have realized the value of implementing loop fusion to evaluate dense (pointwise) array expressions. Recently, the method of template metaprogramming in C++ has been used to significantly speed-up the evaluation of array expressions, allowing C++ programs to achieve performance comparable to or better than FORTRAN for numerical analysis applications. Unfortunately, the template metaprogramming technique suffers from several limitations in applicability, portability, and potential performance. We present a framework for evaluating dense array expressions in object-oriented programming languages. We demonstrate how this technique supports both common subexpression elimination and threaded implementation and compare its performance to object-library and hand-generated code.
Computing distribution of light in a given environment is an important prob- lem in computer-aided photo~realistic image generation. Radiosity method has been proposed to address this problem which requires an enormou...
Computing distribution of light in a given environment is an important prob- lem in computer-aided photo~realistic image generation. Radiosity method has been proposed to address this problem which requires an enormous amount of calculation and memory. Hierarchical radiosity method is a recent approach that reduces these computational requirements by careful error analysis. It has its idea from the solution methods of N-body problems. Although hier- archical approach has greatly reduced the amount of calculations, satisfactory results still cannot be obtained in terms of processing time. Exploiting paral- lelism is a practical way to reduce the computation time further. In this thesis, we have designed and implemented a parallel hierarchical radiosity algorithm for distributed memory computers. Due to its highly irregular computational structure, hierarchical radiosity algorithms do not yield easily to paralleliza- tion on distributed memory machines. Dynamically changing computational patterns of the algorithm cause severe load imbalances. Therefore, we have developed a dynamic load balancing technique for the parallel hierarchical radiosity calculation.
Positron Emission Tomography (PET) images can be reconstructed using Fourier transform methods. This paper describes the performance of a fully 3-D Backprojection-Then-Filter (BPF) algorithm on the Gray T3E machine an...
详细信息
Positron Emission Tomography (PET) images can be reconstructed using Fourier transform methods. This paper describes the performance of a fully 3-D Backprojection-Then-Filter (BPF) algorithm on the Gray T3E machine and on a cluster of workstations. PET reconstruction of small animals is a class of problems characterized by poor counting statistics. The low-count nature of these studies necessitates 3-D reconstruction in order to improve the sensitivity of the PET system: by including axially oblique Lines Of Response (LORs), the sensitivity of the system can be significantly improved by the 3-D acquisition and reconstruction. The BPF method is widely used in clinical studies because of its speed and easy implementation. Moreover, the BPF method is suitable for on-time 3-D reconstruction as it does not need any sinogram or rearranged data. In order to investigate the possibility of on-line processing, we reconstruct a phantom using the data stored in the list-mode format by the data acquisition system, We show how the intrinsically parallel nature of the BPF method makes it suitable for on-line reconstruction on a MIMD system such as the Gray T3E. Lastly, we analyze the performance of this algorithm on a cluster of workstations.
The LHCb Level-0 trigger implementation with the 3D-Flow system offers full programmability, allowing it to adapt to unexpected operating conditions and enabling new, unpredicted physics. The implementation is describ...
详细信息
The LHCb Level-0 trigger implementation with the 3D-Flow system offers full programmability, allowing it to adapt to unexpected operating conditions and enabling new, unpredicted physics. The implementation is described in detail and refers to components and technology available today. The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on the replication of a single type of circuit of 100k gates, which communicates in six directions: bi-directional with North, East, West, and South neighbors, unidirectional from Top to Bottom, the system offers full programmability, modularity, ease of expansion and adaptation to the latest technology. A complete study of its applicability to the LHCb calorimeter triggers is presented. Full description of the input data handling, either in digital or mixed digital-analog form, of the data processing, and the transmission of results to the global level-0 trigger decision unit are provided. Any level-0 trigger algorithm (2 x 2, 3 x 3, 4 x 4, etc.) with up to 20 steps, can be implemented with zero dead-time, while sustaining input data rate (up to 32-bit per input channel, per bunch crossing) at 40 MHz. For each step, each 3D-Flow processor can execute up to 26 operations, inclusive of compare, ranging, finding local maxima, and efficient data exchange with neighboring channels. (One-to-one correspondence between input channel and trigger tower.) Populated with only two main types of components, front-end FPGAs and 3D-Flow processors, a single type of board, it is shown how the whole Level-0 calorimeter trigger can be accommodated into six crates (9U), each containing 16 identical boards. All 3D-Flow inter-chip Bottom to Top ports connection are all contained on the board (data are multiplexed 2 : ii PCB traces are shorter than 6 cm);all 3D-flow inter-chip North, East, West, and South parts connections, between boards and crates, are multiplexed (8 + 2): 1 and are s
We study the problem of exploiting parallelism from search-based AI systems on distributed machines. We propose stack-splitting, a technique for implementing or-parallelism, which when coupled with appropriate schedul...
详细信息
We study the problem of exploiting parallelism from search-based AI systems on distributed machines. We propose stack-splitting, a technique for implementing or-parallelism, which when coupled with appropriate scheduling strategies leads to: (i) reduced communication during distributed execution; and, (ii) distribution of larger grain- sized work to processors. The modified technique can also be implemented on shared memory machines and should be quite competitive with existing methods. Indeed, an implementation has been carried out on shared memory machines, and the results are reported here.
Should we consider the dimensions of natural neural computation as they are known as a result of the scientific research, we realize there is a long tomorrow before us, interested in neural computation, for the simple...
ISBN:
(纸本)3540660682
Should we consider the dimensions of natural neural computation as they are known as a result of the scientific research, we realize there is a long tomorrow before us, interested in neural computation, for the simple reason that we can only handle a relatively low number of units and connections nowadays. All along this century we have significantly improved our knowledge on natural neural nets, to realize that huge number of cells and connections and begin to understand some of the brain signals processing and the repetitive structures which support it. However, even in the most developed cases, such as the auditory pathway modelling, there is not a neural computational device which can involve a real time response and follow the facts already known or plausibly postulated on some brain processes (e.g. by McCulloch and Pitts), with the unavoidable great number of processing elements involved too, besides neither suitable models regarding those kind of real-look nets have been designed nor their corresponding real-conditions simulations have been carried out. That means there is a lack of connectionistically computable models and also reduction methods by which we can obtain a connectionistic implementation design, given the knowledge level model. Therefore, we would like to ask: what is within reach? In order to answer this question we are going to present a restricted auditory pathway modelling case, where we shall be able to see the realistic challenges we are facing up tp. By trying to propose a consistent implementation for it, based on parallel, modular, distributed and self-programming computation, we shall see the kind of methods, equipment, software and simulations required and desirable.
In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, as the number of processors increases, in the rendering phase, we can get a good speedup because each processor renders im...
详细信息
ISBN:
(纸本)0769503500
In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, as the number of processors increases, in the rendering phase, we can get a good speedup because each processor renders images locally without communicating with other processors. However, in the compositing phase, a processor has to exchange local images with other processors. When the number of processors is over a threshold, the image compositing time becomes a bottleneck. In this paper, we proposed three compositing methods, the binary-swap with bounding rectangle method, the binary-swap with run-length encoding and static load-balancing method, and the binary-swap with bounding rectangle and run-length encoding method, to efficiently reduce the compositing time in the sort-last-sparse parallel volume rendering system on distributed memory multicomputers. The proposed methods were implemented on an SP2 parallel machine along with the binary-swap compositing method. The experimental results show that the binary-swap with bounding rectangle and run-length encoding method has the best performance among the four methods.
Discretization of image restoration problems often leads to a discrete inverse ill-posed problem: the discretized operator is so badly conditioned that it can be actually considered as undetermined. In this case one s...
详细信息
Discretization of image restoration problems often leads to a discrete inverse ill-posed problem: the discretized operator is so badly conditioned that it can be actually considered as undetermined. In this case one should single out the solution which is the nearest to the desired solution. The usual way to do it is to regularize the problem. In this paper we focus on the computational aspects of the Wiener filter within the framework of the regularization methods. The emphasis is on its reliability and its efficiency, both of which become more and more important as the size and the complexity of the real problem grow and the demand for advanced real-time processing increases.
This paper investigates the use of object-oriented techniques for the specification and design of distributed multimedia applications (DMAs). DMAs are a class of software applications with a range of strong-often conf...
详细信息
This paper investigates the use of object-oriented techniques for the specification and design of distributed multimedia applications (DMAs). DMAs are a class of software applications with a range of strong-often conflicting-requirements of dynamicity, interactivity, real-time synchronized processing of several media types, network distribution, high-performance, fault-tolerance, load balancing and security. The development of complex DMAs can benefit from the adoption of object design methods and distributed object implementation technologies. The paper describes the use of two modeling approaches, based on the standard UML modeling language, and on the TRIO formal specification language, respectively. The problem of defining steps to move from the UML or TRIO specification to a CORBA IDL implementation is addressed. An experimental distributed video-on-demand system is used throughout the paper as a case study.
暂无评论