The authors consider a novel parallel SIMD architecture which is a mesh connected computer consisting of N equals N**1**/**2 multiplied by N**1**/**2 processors with broadcasting features in each row and each column. ...
详细信息
ISBN:
(纸本)0818606398
The authors consider a novel parallel SIMD architecture which is a mesh connected computer consisting of N equals N**1**/**2 multiplied by N**1**/**2 processors with broadcasting features in each row and each column. This multiple broadcast allows parallel data transfers within rows and columns of processors. The authors show efficient implementation of parallel algorithms for several numerical and nonnumerical problems. They discuss its applicability to problems such as LU decomposition, linear programming, and finding the minimum spanning tree of a given graph. Significant speed improvements are achievable with this architecture, which would compare favorably with other multiprocessors with log N diameter interconnection networks for N up to 10**6 processors.
A mathematical formalism for modeling reconfigurable mixed systolic array architectures and a methodology for reconfiguring the array by mapping the communication structure of an algorithm into the interconnection str...
详细信息
ISBN:
(纸本)0818606371
A mathematical formalism for modeling reconfigurable mixed systolic array architectures and a methodology for reconfiguring the array by mapping the communication structure of an algorithm into the interconnection structure of the array is presented. A step-by-step procedure is presented to map a given algorithm into the mixed systolic array architecture and then generate the control code required to implement the corresponding interconnection structure. The mapping procedure is based on the time and space transformations of data dependence vectors of the algorithm. Two sample algorithms, the finite impulse response (FIR) filtering algorithm and the priority queue algorithm, are mapped into a linear reconfigurable systolic array.
DIRMU multiprocessor configurations are presented, and experiments with parallel algorithms on this machine are described. DIRMU (Distributed Reconfigurable Multiprocessor kit) allows system designers to configure a b...
详细信息
ISBN:
(纸本)0818606371
DIRMU multiprocessor configurations are presented, and experiments with parallel algorithms on this machine are described. DIRMU (Distributed Reconfigurable Multiprocessor kit) allows system designers to configure a broad spectrum of special-purpose memory-coupled multiprocessor structures (rings, arrays, trees, pyramids) based on a single general-purpose microcomputer building block. The hardware architecture and programming environment of an experimental DIRMU building block with 8086/8087 microprocessors are presented. Three parallel application programs for solving the Laplace partial differential equation have been implemented on DIRMU ring configurations. Speedup measurements for these programs are presented and the influence of different synchronization strategies is discussed.
The architecture of a massive parallel machine (MAN-YO) especially designed for logic design automation is presented. By making use of special-purpose hardware engines and high-performance microprocessors, it is possi...
详细信息
ISBN:
(纸本)0818606371
The architecture of a massive parallel machine (MAN-YO) especially designed for logic design automation is presented. By making use of special-purpose hardware engines and high-performance microprocessors, it is possible to achieve significant speed improvement in such time-consuming tasks as mixed-level logic simulation, logic synthesis and logic verification. This system uses both parallel and function separation techniques in order to exploit concurrences in algorithmic and heuristic computations. It comprises a large array (on the order of a thousand) of processor modules, where each processor module contains a functionally separated logic simulation engine and processors. Logic simulation engines implement logic simulation algorithms by hardware. Microprocessors handle many heuristic problems and high-level functional or symbolic simulations. Cooperation between logic simulation engines and microprocessors allows parallel design rule checking or the achievement of a parallel production system. A loop-structured interconnection network, which is well suited for divide and conquer problems, provides an O(N) connection cost and a shorter communication path. The MAN-YO design concepts and basic machine architecture and brief examples of parallel algorithms are described.
images taken by airborne and laboratory sensors have been used for many purposes including autonomous guidance, robot vision, medical diagnosis, automated inspection and automated measurements. Scene recognition opera...
详细信息
ISBN:
(纸本)0892526106
images taken by airborne and laboratory sensors have been used for many purposes including autonomous guidance, robot vision, medical diagnosis, automated inspection and automated measurements. Scene recognition operations are relatively costly in computing time due to the large amount of data needed to be analyzed. To obtain real-time operations, imageprocessing.and scene recognition can be done with several microprocessors operating in parallel. Since many of the operations are repetitive in nature, the use of multiprocessors can greatly enhance the speed of operations. architectures of several multiprocessor systems capable of performing imageprocessing.and scene matchingare described. The design of a new architecture is also described. This system is designed to take advantages of computing capability of a single stream/multi-data stream structure and the architectural simplicity of a pipeline computer.
The use of digital computers to process various types of sensor data is becoming increasingly common, in both civilian and military applications. One example of this use is the enhancement of photographs to increase t...
The use of digital computers to process various types of sensor data is becoming increasingly common, in both civilian and military applications. One example of this use is the enhancement of photographs to increase their clarity, or emphasize a particular detail. Previously, the computers used to perform this processing.was done in specialized circuits, mainframe or minicomputers. More recently, extremely powerful microprocessors have become available that show potential to be applied in this area. This thesis explores a particular class of imageprocessing. known as image Segmentation, implemented on a particular microprocessor. The microprocessor is the Fairchild F9450, the first civilian version of the 1750A military specification microprocessor. This microprocessor, along with its associated chip set, appears well suited to imageprocessing. having high speed capability, direct floating point arithmetic instructions, multiprocessing.capacity, and the ability to address up to sixteen megabytes of memory. Additionally, a sophisticated software development tool set, known as Microprocessor Pascal, is available to develop and test software for the 1750A/F9450 microprocessor. This tool set allows software to be developed on the VAX-11/780 minicomputer, targeted for final use on the 1750A/F9450. This work utilized the Microprocessor Pascal tool set to test and compare representative image Segmentation algorithms. The speeds of execution and code sizes of the programs were determined for the F9450/1750A microprocessor and the VAX-11/780 minicomputer, and were compared to determine the feasibility of using the F9450/1750A microprocessor for image segmentation work. Several images resulting from the image segmentation processing.are included, as well as the Pascal programs used to perform the processing.
The authors are engaged in developing a computer that is optimized for use on two-dimensionally structured data. This machine is a cellular array computer with one processor for each pixel or matrix element of the inp...
详细信息
The authors are engaged in developing a computer that is optimized for use on two-dimensionally structured data. This machine is a cellular array computer with one processor for each pixel or matrix element of the input data. Cellular array computers have been built in the past and are fairly well known. What is radically different about the Hughes 3-D computer is the degree of integration employed in its construction. This level of integration is made possible by the development of technologies that permit massively parallel communication channels between silicon wafers and through silicon wafers. These communication channels make it possible to stack silicon wafers containing arrays of circuitry, one on top of another, to form a three-dimensionally integrated computer. Additional benefits, beyond high data throughput, are derived from these massively parallel communication channels. These additional benefits include modular construction, fault tolerance, compact size, low power consumption, and flexible architecture.
作者:
Smith, Alan JayUniv of California
Berkeley Computer Science Div Berkeley CA USA Univ of California Berkeley Computer Science Div Berkeley CA USA
The effective and efficient use of the memory hierarchy of the computer system is one of the, if not the single, most important aspect of computer system design and use. Cache memory performance is often the limiting ...
详细信息
The effective and efficient use of the memory hierarchy of the computer system is one of the, if not the single, most important aspect of computer system design and use. Cache memory performance is often the limiting factor in CPU performance and cache memories also serve to cut the memory traffic in multiprocessor systems. Multiprocessor systems also require advances in cache architecture with respect to cache consistency. Similarly, the study of the best means to share main memory is an important research topic. Disk cache is becoming important for performance in high end computer systems and is now widely available commercially;there are many related research problems. The development of mass storage, especially optical disk, will promote research in effective algorithms for file management and migration. In this paper, author looks at each component of the memory hierarchy and addresses two issues: what are likely directions for development, and what are the interesting research problems.
Real-time imageprocessing.in an application environment needs a set of low-cost implementations of various algorithms. This paper presents a one chip VLSI median filter based on a systolic processor and working at vi...
详细信息
Real-time imageprocessing.in an application environment needs a set of low-cost implementations of various algorithms. This paper presents a one chip VLSI median filter based on a systolic processor and working at video rate. It includes its own memory and can be used without any image memory for on-line processing. The architectural choices have made it possible to design a small size chip with a high performance level.
In this paper, we deal with the problem of detecting and segmenting objects in textured darkfield digitalimagery for automated visual inspection applications. The technique we will follow is based on a sequential app...
详细信息
In this paper, we deal with the problem of detecting and segmenting objects in textured darkfield digitalimagery for automated visual inspection applications. The technique we will follow is based on a sequential application of local operators which serves the purpose of clustering the object and the background gray levels. This procedure can be considered as an extension of average-thresholding type techniques. This algorithm has fast implementations in general purpose imageprocessing.pipeline architectures and therefore, it is appealing to real-time computer vision applications. Computational examples showing the effectiveness of the segmentation technique will be discussed.
暂无评论