This paper describes practical implementation details for a second-order approximation to the parallel model combination (PMC) algorithm with application to large vocabulary distributed speech recognition. The propose...
详细信息
This paper describes practical implementation details for a second-order approximation to the parallel model combination (PMC) algorithm with application to large vocabulary distributed speech recognition. The proposed method is capable of simultaneously adapting to noise and channel changes. A more accurate method for computing the derivatives based on numeric integration PMC is introduced. The proposed second-order adaptation algorithm requires only twice the memory and computation of standard Jacobian Adaptation (JA). This represents a 382-fold reduction in memory and a 29-fold reduction in computation. Moreover, the proposed algorithm produces models that are much closer to the PMC-derived models than standard JA.
The ongoing production of Petabytes of multimedia data per year creates, an urgent need for the organisation, management, and retrieval of multimedia information. Related memory. bandwidth, and computational requireme...
详细信息
ISBN:
(纸本)0769512305
The ongoing production of Petabytes of multimedia data per year creates, an urgent need for the organisation, management, and retrieval of multimedia information. Related memory. bandwidth, and computational requirements, often surpass the capabilities of traditional database systems and computer architectures. Moreover, improved retrieval techniques allow a manual selection of regions of interest, which are subsequently searched in all media in the database by using, dynamically extracted features. This paper presents techniques for parallel multimedia retrieval by considering an image database as, an example. The discussed cluster architecture depicts one, possible solution for the performance problem. The distribution of the image data over a large number of nodes enables a parallelprocessing of the compute intensive operations for dynamic image retrieval. Thus, the partitioning of the data and the, applied strategies for workload balancing have a decisive impact on the performance,, efficiency, and the usability of such image databases.
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors t...
详细信息
This paper presents techniques for parallel multimedia retrieval by considering an image database as an example. The discussed cluster architecture depicts one possible solution for the performance problem. The distri...
详细信息
This paper presents techniques for parallel multimedia retrieval by considering an image database as an example. The discussed cluster architecture depicts one possible solution for the performance problem. The distribution of the image data over a large number of nodes enables a parallelprocessing of the compute intensive operations for dynamic image retrieval. Thus, the partitioning of the data and the applied strategies for workload balancing have a decisive impact on the performance, efficiency, and the usability of such image databases.
The proceedings contain 67 papers. The topics discussed include: designing parallel sparse matrix algorithms beyond data dependence analysis;run-time characterization of irregular accesses applied to parallelization o...
ISBN:
(纸本)0769512607
The proceedings contain 67 papers. The topics discussed include: designing parallel sparse matrix algorithms beyond data dependence analysis;run-time characterization of irregular accesses applied to parallelization of irregular reductions;solution of computational fluid dynamics problems on parallel computers with distributed memory;a data and task parallelimageprocessing' environment for distributed memory systems;parallel implementation of wavelet transforms on distributed-memory multicomputers;performance comparison of parallel finite element and Monte Carlo methods in optical tomography;parallel ray tracing using processor farming model;parallel domain decomposition methods for dam problem;an efficient parallel algorithm for solving unsteady nonlinear equations;partial stabilization of large-scale discrete-time linear control systems;and modular construction of model partitioning processes for parallel logic simulation.
The proceedings contain 274 papers. The topics discussed include: influence of array allocation mechanisms on memory system energy;high performance computing in coastal and hydraulic applications;large scale parallel ...
ISBN:
(纸本)0769509908
The proceedings contain 274 papers. The topics discussed include: influence of array allocation mechanisms on memory system energy;high performance computing in coastal and hydraulic applications;large scale parallel and distributed simulations and visualizations of the Olami-Feder-Christiensen earthquake model;benchmark of parallelization methods for unstructured shock capturing code;parallel simulation of radio-base antennas on massively parallel systems;fast and scalable parallel algorithms for matrix chain product and matrix powers on distributed memory systems;mixed parallel implementations of the top level step of Strassen and Winograd matrix multiplication algorithms;a rotate-tiling image composition method for parallel volume rendering on distributed memory multicomputers;and directory based composite routing and scheduling policies for dynamic multimedia environments.
A new technique for transmitting information through multimode fiber-optic cables is presented. This technique sends parallel channels through the fiber-optic cable, thereby greatly improving the data transmission rat...
详细信息
A new technique for transmitting information through multimode fiber-optic cables is presented. This technique sends parallel channels through the fiber-optic cable, thereby greatly improving the data transmission rate compared with that of the current technology, which uses serial data transmission through single-mode fiber. An artificial neural network is employed to decipher the transmitted information from the received speckle pattern. Several different preprocessing algorithms are developed, tested, and evaluated. These algorithms employ average region intensity,distributed individual pixel intensity, and maximum mean-square-difference optimal group selection methods. The effect of modal dispersion on the data rate is analyzed. An increased data transmission rate by a factor of 37 over that of single-mode fibers is realized. When implementing our technique, we can increase the channel capacity of a typical multimode fiber by a factor of 6. (C) 2001 Optical Society of America OC;CS codes: 060.0060, 060.2330, 060.2350, 060.4230, 200.4260.
This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the proce...
详细信息
ISBN:
(纸本)0769509851
This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the processor resources. Ar the end of each iteration, the results of the correspondence determination are communicated back to a central processor and the current transformation is calculated A number of additional techniques were developed that sen,ed to improve upon this basic scheme. Calculating the partial sums within each distributed resource made it unnecessary to transmit the correspondence values back to the central processor, which reduced the communication overhead, and improved time performance. Randomly distributing the points among the processor resources resulted in a better load balancing, which further improved time performance. We also found that thinning the image by randomly removing a certain percentage of the points did not improve the performance, when viewed as the progression of mse with time. The method was implemented and tested on a 22 node Beowulf class cluster. For a large image, linear performance improvements were obtained for up to 16 processors, while they held for rtp to 8 processors with a smaller image.
The Simplex Method, the most popular method for solving Linear Programs (LPs), has two major variants. They are the revised method and the standard, or full tableau method. Today, virtually all serious implementations...
The Simplex Method, the most popular method for solving Linear Programs (LPs), has two major variants. They are the revised method and the standard, or full tableau method. Today, virtually all serious implementations are of the revised method because it is more efficient for sparse LPs which are the most common. However, the full tableau method has advantages as well. First, the full tableau can be very effective for dense problems. Second, a full tableau method can easily and effectively be extended to a coarse grained distributed algorithm. While dense problems are uncommon in general, they occur frequently in some important applications such as digital filter design, text categorization, imageprocessing and relaxations of scheduling problems. We implement two full tableau algorithms. The first, a serial implementation, is effective for small to moderately sized dense problems. The second, a simple extension of the first, is a distributed algorithm, which is effective for large problems of all densities. We developed performance models that predict running times per iteration for the serial version of our method, the parallel version of our method and the revised method for problems of different sizes, aspect ratios and densities. We also developed methods for choosing the number of processors to optimize the tradeoff between computation and communication in distributed computations. We tested our algorithms on practical (Netlib) and synthetic problems.
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors t...
详细信息
ISBN:
(纸本)0769509908
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors to a power of two or require many steps to transform image data that results in high communication overheads. In this paper, we present an efficient image composition method, the rotate-tiling (RT), for parallel volume rendering on distributed memory multicomputers. The RT method can fully utilize all available processors and minimize the communication overheads. In addition, we provide data compression method, the template run-length encoding (TRLE), to further reduce the communication data size. To evaluate the performance of the RT method, we compare the proposed method with the binary-swap method and the parallel-pipelined method. Both theoretical analysis and experimental test are conducted. In the theoretical analysis, we analyze the best performance bound of the RT method in terms of the startup time, the data transmission time, the number of processors, and the number of initial block of a sub-image. In the experimental test, we have implemented these three methods on an SP2 parallel machine. Three volume datasets are used as test samples. The experimental results show that our method outperforms the binary-swap and the parallel-pipelined methods for all test samples and match the results analyzed in the theoretical analysis. For the TRLE method, the experimental results show that the TRLE method can further reduce the composition time for these three methods.
暂无评论