A transactional paradigm is suggested for computer-assisted parallelization of programs and register-cache scheduling. It can serve as a building tool for pipelining, data parallellism, or generic parallellism in a va...
详细信息
A transactional paradigm is suggested for computer-assisted parallelization of programs and register-cache scheduling. It can serve as a building tool for pipelining, data parallellism, or generic parallellism in a variety of architectures and the cost of execution can be estimated realistically.< >
Image matching based on image feature pixels involves heavily iterated computation and repeated memory access. In our previous work the detection of interesting points has been reported as an efficient pre-processing ...
详细信息
Image matching based on image feature pixels involves heavily iterated computation and repeated memory access. In our previous work the detection of interesting points has been reported as an efficient pre-processing step to extract binary images for further matching in terms of certain distance measurement. This paper presents our extension to a parallel implementation of the matching scheme for object recognition on a low cost heterogeneous PVM (parallel Virtual Machine) network. While most of the sequential execution time is spent on image feature extraction, distance transform and matching measurement, our investigation shows that a distributed memory multicomputer can best meet the high computational and memory access demands in image processing. The performance is evaluated in terms of execution time. We conclude that parallel image processing con be implemented on a general distributed system to achieve the speedup without specific hardware requirement.< >
A hierarchy of unidirectional rings has been used successfully in distributed shared-memory multiprocessors. The fixed cluster size of the hierarchy prevents full exploitation of communication locality. The bidirectio...
详细信息
distributed computer systems for real-time control require a global timebase with high precision. A small time skew between local clocks in the system is required to obtain good control performance through well synchr...
详细信息
distributed computer systems for real-time control require a global timebase with high precision. A small time skew between local clocks in the system is required to obtain good control performance through well synchronised task execution, but also provides a base for efficient communication. In distributed safety critical applications, clocks have traditionally been synchronised with fault tolerant clock synchronisation algorithms. With these methods, a limited number of erroneous clock readings are allowed in each adjustment. On the other hand, readings from all clocks in the system are required before an adjustment can be made. In this paper an alternative approach, the Daisy Chain method, is proposed and compared with present solutions. Daisy Chain synchronisation does not allow erroneous clock readings, but methods of avoiding them are described. Due to its simplicity, the method can be implemented with little hardware. Low precision frequency sources are sufficient and recovery after arbitrary failures is fast because no special start up phase is required. The paper also discusses effects of quantisation uncertainty and transmission delay, and outline the implementation of a global time base in an embedded distributed real-time architecture.< >
This paper presents a mapping scheme for parallel pipelined execution of the Backpropagation Learning Algorithm on distributed memory multiprocessors (DMMs). The proposed implementation exhibits training set paralleli...
详细信息
This paper presents a mapping scheme for parallel pipelined execution of the Backpropagation Learning Algorithm on distributed memory multiprocessors (DMMs). The proposed implementation exhibits training set parallelism that involves batch updating. Simple algorithms have been presented, which allow the data transfer involved in both forward and backward executions phases of the backpropagation algorithm to be carried out with a small communication overhead. The effectiveness of our mapping has been illustrated, by estimating the speedup of a proposed implementation on an array of T-805 transputers.< >
The application of artificial neural networks (ANN) in real-time embedded systems demands high performance computers. Miniaturized massively parallel architectures are suitable computation platforms for this task. An ...
详细信息
The application of artificial neural networks (ANN) in real-time embedded systems demands high performance computers. Miniaturized massively parallel architectures are suitable computation platforms for this task. An important question which arises is how to establish an effective mapping from ANN algorithms to hardware. In this paper, we demonstrate how an effective mapping can be achieved with our programming environment in close combination with an optimized architecture design targeted for neuro-computing.< >
Accurate and rapid evaluation of radar signature for alternative aircraft/store configurations would be of substantial benefit in the evolution of integrated designs that meet radar cross section requirements across t...
详细信息
Accurate and rapid evaluation of radar signature for alternative aircraft/store configurations would be of substantial benefit in the evolution of integrated designs that meet radar cross section requirements across the threat spectrum. Finite-volume time domain methods offer the possibility of modeling the whole aircraft, including penetrable regions and stores, at longer wavelengths on today's supercomputers and at typical airborne radar wavelengths on the massively parallel teraflop computers of tomorrow. To realize this potential, practical means are being developed for the rapid generation of grids on and around the aircraft, and numerical algorithms that maintain high order accuracy on such grids are being constructed. A structured grid and an unstructured grid based finite-volume, time-domain Maxwell's equation solver has been developed incorporating modeling techniques for general radar absorbing materials. Using this work as a base, the goal of the computational electromagnetics effort is to define, implement, and evaluate rapid prototype signature prediction, addressing many issues related to (1) physics of electromagnetics, (2) efficient and higher-order accurate algorithms, (3) boundary condition procedures, (4) geometry and gridding (structured and unstructured), (5) computer architecture, and (6) validation.< >
A scalable algorithm for the reduction to tridiagonal form of symmetric matrices is developed. It uses one-sided rotations instead of similarity transforms. This allows a data distribution independent implementation w...
详细信息
A scalable algorithm for the reduction to tridiagonal form of symmetric matrices is developed. It uses one-sided rotations instead of similarity transforms. This allows a data distribution independent implementation with low communication volume. Timings on the Fujitsu AP 1000 and VPP 500 show good performance.< >
This paper introduces the doubly-linked list (DLL) protocol for distributed shared memory (DSM) multiprocessor systems. The protocol makes use of two linked lists to keep track of valid copies of pages in the system, ...
详细信息
This paper introduces the doubly-linked list (DLL) protocol for distributed shared memory (DSM) multiprocessor systems. The protocol makes use of two linked lists to keep track of valid copies of pages in the system, thus eliminating the use of copysets. Simulation studies show that the DLL protocol achieved considerable speed-up for common mathematical problems including a linear equations solver and a matrix multiplier. Performance improvement of up to 51.9% over the dynamic distributed manager algorithm is obtained. Further improvement and possible modification of the protocol are also discussed.< >
The problem considered in this paper is the implementation of motion-detection on distributed-memory MIMD machines. The solution here proposed is based on pyramidal algorithms that, by iteratively discarding uninteres...
详细信息
The problem considered in this paper is the implementation of motion-detection on distributed-memory MIMD machines. The solution here proposed is based on pyramidal algorithms that, by iteratively discarding uninteresting details, allow to focus on the moving parts of an image stream. Different parallelisation methodologies have been evaluated and the most promising ones have been implemented on a Transputer-based parallel machine. Experimental results are here presented and compared with the theoretical ones.< >
暂无评论