this paper describes a practical Fourier Optical Signal Processor (FOSP) as a pre-processor for Digital Signal processing (DSP). the input illumination is a coherent light source from a HeNe laser and computation of t...
详细信息
this paper describes a practical Fourier Optical Signal Processor (FOSP) as a pre-processor for Digital Signal processing (DSP). the input illumination is a coherent light source from a HeNe laser and computation of the Fourier Transform (FT) is carried out via an FT lens. An image is placed in the front focal plane of the FT lens and at the back focal plane of the lens, the Fourier transform image results. Using the method reported by Xu[1] for the measurement of the phase information, the real and imaginary information is retrieved and fed electronically into a Fast Fourier Transform (FFT) module, using 24-bit floating point units capable of performing efficient multiplication, addition and subtraction, to reconstruct the original image. Many of such floating-point units are used in parallel in a pipelined architecture to perform FFT or other complex function for Digital Signal processing (DSP) computations.
Simulated annealing is an effective method for solving large combinatorial optimization problems. It has been successfully applied to the cell placement and routing problem of VLSI circuit design [SE88], and other cla...
详细信息
Simulated annealing is an effective method for solving large combinatorial optimization problems. It has been successfully applied to the cell placement and routing problem of VLSI circuit design [SE88], and other classical optimization problems such as the traveling salesman problem and the graph partitioning problem [LA88]. One major drawback of simulated annealing is that it requires substantial amount of computation time. In this paper, we present a new parallel implementation of simulated annealing which is based on the concurrency control theory of database systems. the parallelized computation is serializable, hence the result obtained is equivalent to an serial execution of the original sequential annealing algorithm. In our implementation, we assume a shared-memory MIMD multiprocessing environment with 16 to 32 processors. Preliminary analysis shows that our method is able to achieve an efficiency (ratio of speedup to the number of processors) of about 40 to over 90 percent depending on the acceptance rate.
SapePar-i860 is an execution driven 9;Simulator and Performance Evaluator for i860 based parallel Architecture9;. the simulator has various inputs like the primary cache size, primary cache type, secondary cache...
详细信息
SapePar-i860 is an execution driven 'Simulator and Performance Evaluator for i860 based parallel Architecture'. the simulator has various inputs like the primary cache size, primary cache type, secondary cache size, secondary cache type, secondary cache replacement policy, number of processors, interconnection between the processors, communication path from each processor to other processors, type of communication and bandwidth of communication channels. the simulator gets configured automatically according to the inputs and generates log files. the performance analyzer generates statistics of the run based on these log files. the statistics includes cache hit ratio of primary cache, cache hit ratio of secondary cache, Compute Communication Ratio(CCRi) of each processor, the Compute Communication Ratio (CCRs) of overall system, efficiency of each processor and efficiency of overall system. this also computes the speedup, execution time in terms of clock ticks.
In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributed computing, and in the area of high-speed networking. these advances in compu...
详细信息
In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributed computing, and in the area of high-speed networking. these advances in computing technology have also led to a tremendous increase in the amount of data being manipulated and produced by scientific and commercial application programs. Despite their popularity, message-passing libraries only provide part of the support necessary for most high performance distributed computing applications-support for high speed parallel I/O is still lacking. In this paper, we provide an overview of the conceptual design of a parallel and distributed I/O file system, the Virtual parallel File System (VIP-FS), and describe its implementation. VIP-FS makes use of message-passing libraries to provide a parallel and distributed file system which can execute over multiprocessor machines or heterogeneous network environments.< >
Knowledge based systems (KBS) of first generation are characterized by the separation of domain specific knowledge and general problem solving strategies. Such systems lack of the following important abilities: knowle...
详细信息
Knowledge based systems (KBS) of first generation are characterized by the separation of domain specific knowledge and general problem solving strategies. Such systems lack of the following important abilities: knowledge acquisition according different paradigms of representation and processing, definition of deep knowledge caused by physiological processes of reasoning, and management of different abstraction levels. Next generation KBS provide a bases from managing these problems. Essential characteristics of this new systems are modularization of knowledge, distribution of knowledge across different hardware and software resources, and use of object-oriented technology for integrating symbolic and subsymbolic knowledge on different levels of abstraction.< >
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the gr...
详细信息
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the grid, others are non-uniform: that is, some regions of the grid have more data points than others. We introduce a new block decomposition method, Fair Binary Recursive Decomposition (FBRD), which as suitable for a collection of heterogeneous processors, and extend it to accommodate non-uniform problems (NUFBRD). Mathematical comparisons of the NUFBRD method and other common partitioning schemes are presented to show the expected performance level of this new decomposition technique.< >
the objective of this research is to propose a low complexity static scheduling and allocation algorithm for message-passing architectures by considering factors such as communication delays, link contention, message ...
详细信息
the objective of this research is to propose a low complexity static scheduling and allocation algorithm for message-passing architectures by considering factors such as communication delays, link contention, message routing and network topology. As opposed to the conventional list-scheduling approach, our technique works by first serializing the task graph and "injecting" all the tasks to one processor. the parallel tasks are then 'bubbled up' to other processors and are inserted at appropriate time slots. the edges among the tasks are also scheduled by treating communication links between the processors as resources. the proposed approach takes into account the link contention and underlying communication routing strategy, and can self-adjust on regular as well as arbitrary network topologies. To reduce the complexity, our scheduling algorithm is itself parallelized. To our knowledge, this is the first attempt in designing a parallel algorithm for scheduling. the proposed approach implemented on an iPSC/860 hypercube, while yielding a high speedup in its execution, performs considerably better under a wide range of parameters including the task graph size, communication-to-computation ratio, and the target system topology. Comparisons are made with two other approaches.< >
Very often many parameterized trials of a simulation are required. Modifying parameters of a simulation will often only modify a fraction of the work. this paper proposes time scale combining (TSC) to combine multiple...
详细信息
Very often many parameterized trials of a simulation are required. Modifying parameters of a simulation will often only modify a fraction of the work. this paper proposes time scale combining (TSC) to combine multiple independent simulation trials by sharing common events and inter-processor communication messages across distinct simulations. As a result, some of the overhead experienced in parallel computation can be amortized across these multiple simulations, reducing the total time required to execute all of the simulations. the results from an experimental evaluation on both asynchronous and synchronous parallelarchitectures are presented.< >
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively parallelprocessing (MPP) mac...
详细信息
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively parallelprocessing (MPP) machines since the number of computations can be increased to match the available number of processors. Performance tuning can be particularly difficult for these applications since it must often be performed with a smaller problem size than that targeted for eventual execution. this research develops a performance prediction methodology that addresses this problem through symbolic analysis of program source code. Algebraic manipulations can then be performed on the resulting analytical model to determine performance for scaled up applications on different hardware architectures.< >
We describe two parallel analog VLSI architecturesthat integrate optical flow data obtained from arrays of elementary velocity sensors to estimate heading direction and time-to-contact. For heading direction computat...
We describe two parallel analog VLSI architecturesthat integrate optical flow data obtained from arrays of elementary velocity sensors to estimate heading direction and time-to-contact. For heading direction computation, we performed simulations to evaluate the most important qualitative properties of the optical flow field and determine the best functional operators for the implementation of the architecture. For time-to-contact we exploited the divergence theorem to integrate data from all velocity sensors present in the architecture and average out possible errors.
暂无评论