This paper presents the design and implementation of a high performance vehicle controller based on parallel digital processing systems for Automated Vehicles. From the literature it has been observed that one of the ...
详细信息
ISBN:
(纸本)081942305X
This paper presents the design and implementation of a high performance vehicle controller based on parallel digital processing systems for Automated Vehicles. From the literature it has been observed that one of the main limiting factors of most automated vehicles rests on the available computing power. Most systems employ camera vision for guidance purposes. In some cases other sensors are used in combination with camera vision. The amount of information that has to be processed can overwhelm many processors. Solutions so far involved distributed processing massively parallel processors, dedicated processors and mini computers. In most cases, these systems use specially designed processors, lacking standard interfacing, and as a result proprietary interface cards have to be built. This paper takes the alternate approach of designing a high performance controller using the parallel DSP systems, namely, the TMS320C40 processors with 275 MIPS and 50 MFLOPS. This controller processes data from a CCD camera which is focussed onto a road segment containing a line that has suitable contrast with the road surface. The DSP based controller in a PC environment. Carries out the task of high level control while the low level servo control is assigned to dedicated motion controllers communicating with the DSP based controller through the PC bus. Results of image processing and timing requirements for various topologies are detailed.
The proceedings contain 44 papers. The special focus in this conference is on Automatic Data Distribution and Locality Enhancement. The topics include: Cross-loop reuse analysis and its application to cache optimizati...
ISBN:
(纸本)3540630910
The proceedings contain 44 papers. The special focus in this conference is on Automatic Data Distribution and Locality Enhancement. The topics include: Cross-loop reuse analysis and its application to cache optimizations;locality analysis for distributed shared-memory multiprocessors;data distribution and loop parallelization for shared-memory multiprocessors;data localization using loop aligned decomposition for macro-dataflow processing;exploiting monotone convergence functions in parallel programs;exact versus approximate array region analyses;context-sensitive interprocedural analysis in the presence of dynamic aliasing;initial results for glacial variable analysis;compiler algorithms on if-conversion, speculative predicate assignment and predicated code optimizations;determining asynchronous pipeline execution times;compiler techniques for concurrent multithreading with hardware speculation support;resource-directed loop pipelining;integrating program optimizations and transformations with the scheduling of instruction level parallelism;parametric computation of margins and of minimum cumulative register lifetime dates;global register allocation based on graph fusion;automatic parallelization for non-cache coherent multiprocessors;eliminating lock overhead in automatically parallelized object-based programs;optimal reordering and mapping of a class of nested-loops for parallel execution;communication-minimal tiling of uniform dependence loops;communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors and resource-based communication placement analysis.
In many systems, backward recovery constitutes a classical technique to ensure fault-tolerance. It consists in restoring a computation in a consistent global state, saved in a global checkpoint, from which this comput...
详细信息
In many systems, backward recovery constitutes a classical technique to ensure fault-tolerance. It consists in restoring a computation in a consistent global state, saved in a global checkpoint, from which this computation can be resumed. A global checkpoint includes a set of local checkpoints, one from each process which correspond to local states dumped onto stable storage. In this paper we are interested in defining formally the domino effect for shared memory systems be the shared memory a physical one (as in multiprocessor systems) or a virtual one (as in distributed shared memory systems) and in designing a domino-free adaptive algorithm. These results lie on a necessary and sufficient condition which shows when a set of local checkpoints can belong to some consistent global checkpoint.
ILSP (Interlaced inner and outer Loop Software Pipelining) is an efficient algorithm of optimizing operations in the nested loops. To ensure the ILSP has a good time efficiency and a good space efficiency, there must ...
详细信息
ISBN:
(纸本)0818678763
ILSP (Interlaced inner and outer Loop Software Pipelining) is an efficient algorithm of optimizing operations in the nested loops. To ensure the ILSP has a good time efficiency and a good space efficiency, there must be an efficient nested control mechanism to support the algorithm. Our control mechanism is realized by hardware, it avoids adding many extra instructions and minimises the II (Initialization Interval) of each loop in the nested loop. In cooperation with the compiler, our nested loop control mechanism can efficiently support the software pipelining of the nested loop, and can ensure the ILSP has a high speedup and a low space cost.
PB-BLAS (parallel Block Basic linear Algebra Subprograms) is the first parallel implementation of BLAS, whose data are decomposed based on a block cyclic data distribution. It is functionally an extended subset of the...
详细信息
ISBN:
(纸本)0818682272
PB-BLAS (parallel Block Basic linear Algebra Subprograms) is the first parallel implementation of BLAS, whose data are decomposed based on a block cyclic data distribution. It is functionally an extended subset of the Level 2 and 3 BLAS for distributed-memory systems. The authors present a new modified version of PB-BLAS, which eliminated the positional restrictions of the data matrices from the old version. And it is more efficient in memory space management as well as in performance for PB-BLAS routines which handle a triangular, symmetric, or Hermitian matrix. The paper outlines PB-BLAS and ScaLAPACK, which is a library of high performance linear algebra routines for distributed-memory computers. And it describes a parallel Cholesky factorization routine, and shows the performance with the old and the new version of PB-BLAS on the Intel Paragon computer.
Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great prog...
详细信息
ISBN:
(纸本)0818678763
Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great progress has been achieved. However, how to apply the result of pointer analysis to dataflow analysis and other program optimization/parallelization is not well studied. This paper presents an efficient interprocedural framework based on two insights in real C program and its use in deriving an context-sensitive pointer analysis algorithm and an accurate interprocedural modification side effects (MOD) computation. Based on the result of the pointer analysis, the inaccuracy induced by merging aliasing information is also studied.
Knowledge of the complex molecular structures of living cells is being accumulated at a tremendous rate. Key technologies enabling this success have been high performance computing and powerful molecular graphics appl...
详细信息
Knowledge of the complex molecular structures of living cells is being accumulated at a tremendous rate. Key technologies enabling this success have been high performance computing and powerful molecular graphics applications, but the technology is beginning to seriously lag behind challenges posed by the size and number of new structures and by the emerging opportunities in drug design and genetic engineering. A visual computing environment is being developed which permits interactive modeling of biopolymers by linking a 3D molecular graphics program with an efficient molecular dynamics simulation program executed on remote high-performance parallel computers. The system will be ideally suited for distributedcomputing environments, by utilizing both local 3D graphics facilities and the peak capacity of high-performance computers for the purpose of interactive biomolecular modeling. To create an interactive 3D environment three input methods will be explored: (1) a six degree of freedom `mouse' for controlling the space shared by the model and the user;(2) voice commands monitored through a microphone and recognized by a speech recognition interface;(3) hand gestures, detected through cameras and interpreted using computer vision techniques. Controlling 3D graphics connected to real time simulations and the use of voice with suitable language semantics, as well as hand gestures, promise great benefits for many types of problem solving environments. Our focus on structural biology takes advantage of existing sophisticated software, provides concrete objectives, defines a well-posed domain of tasks and offers a well-developed vocabulary for spoken communication.
The paper presents efficient scalable algorithms for performing prefix (PC) and general prefix (GPC) computations on a distributed shared memory, (DSM) system with applications. PC and GPC are generic techniques that ...
详细信息
The paper presents efficient scalable algorithms for performing prefix (PC) and general prefix (GPC) computations on a distributed shared memory, (DSM) system with applications. PC and GPC are generic techniques that can be used to design sequential and parallel algorithms for a number of problems from diverse areas (K. Arvind et al., 1995; V. Kamakoti and C. Pandurangan, 1992).
In the course of the development of reactive systems often real time constraints have to be met. In such time critical applications heterogeneous multi-processor systems are used in order to fulfill these time constra...
详细信息
In the course of the development of reactive systems often real time constraints have to be met. In such time critical applications heterogeneous multi-processor systems are used in order to fulfill these time constraints. This paper presents a hybrid partitioning method that uses a stochastic algorithm together with mixed integer linear programming. This method supports the development of time critical systems. We assume that the algorithm which has to be analyzed is given in the form of a so-called task-graph. The goal of the overall method is to fix for every task the processor that will execute it and the starting time of this execution. The main issue is a high-level synthesis-like method for constructing a problem-specific multi-processor board. The presented methods have been fully implemented and tested.
Although MPEG-1 Video is a promising and the most widely used moving picture compression standard it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. In...
详细信息
ISBN:
(纸本)0818678763
Although MPEG-1 Video is a promising and the most widely used moving picture compression standard it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. In this paper we propose and implement an efficient parallelizing scheme for an MPEG-1 Video encoding algorithm on Ethernet-connected workstations which is the most widely available computing environment nowadays. In this parallelizing scheme, the slice-level, frame-level, and GOP (Group of Pictures)-level parallelisms are identified as the attractive parallelisms that can be exploited in Ethernet-connected workstations. Three efficient parallel implementation schemes considering the communication characteristics of Ethernet-connected workstations are also proposed and experimented A series of experiments using thirty workstations shows that the MPEG-1 Video encoding time can be reduced in proportional to the number of workstations used in encoding computations although there is a saturation point in the speedup graphs.
暂无评论