the fast simulation of large networks of spiking neurons is a major task for the examination of biology inspired vision systems. Networks of this type are labelling features by synchronization of spikes and there is s...
详细信息
ISBN:
(纸本)0769500439
the fast simulation of large networks of spiking neurons is a major task for the examination of biology inspired vision systems. Networks of this type are labelling features by synchronization of spikes and there is strong demand to simulate those effects in a real world environment. Because of the quite complex calculations for one model neuron the simulation of thousands or millions of these neurons is not efficient on existing hardware platforms. In order to simulate closer to the real time requirement, it is necessary to implement a dedicated hardware. Our aim is a hardware system mainly consisting of standard components which is as flexible as possible concerning the model neuron but as specialized as necessary to meet our performance requirements. thus we decided to implement a parallel system with Digital Signal Processors (DSP) offering a large on-chip-memory. One main task of this work is the optimization of the simulation algorithm for the neurons distributed to the DSP which means the sequential part of simulation. this optimization benefits from the fact that there is only a very low percentage of simultaneously active neurons in vision networks. For communication between the nodes only spikes are distributed via a spike switching network. processing of the network topology is realized by two different concepts. One idea is to compute the synapses autonomously on the processing node by representing a regular connection scheme with one connection mask for many neurons. Additional connections requiring adaptability and irregular connection schemes are stored in a shared memory. To avoid a bottleneck a synapse caching is used within each processing node. this paper describes the architecture of a DSP accelerator and shows the advantages with simulation results from a typical large vision network.
the scheduling problem considered consists of determining a feasible schedule when each job is defined with a ready date, a processing time different on each machine and a deadline. they are setup time sequence depend...
详细信息
the scheduling problem considered consists of determining a feasible schedule when each job is defined with a ready date, a processing time different on each machine and a deadline. they are setup time sequence dependent. the parallel machines are composed of 2 types. the first one is characterized with "cheap" machines, and the second one with "expensive" machines. the objective function is to first find a feasible schedule and then to minimize the cost due to assignment and setup time costs. the method presented in this paper is based on a three-phase heuristic. the first phase is based on an iterative heuristic, the second one on a genetic algorithm and the third one on a branch and bound for post optimization.
Describes the design and construction of an autonomous vehicle having the important characteristics of versatility, portability and low cost. Additionally, the vehicle's mechanical and electronic features permit r...
详细信息
Describes the design and construction of an autonomous vehicle having the important characteristics of versatility, portability and low cost. Additionally, the vehicle's mechanical and electronic features permit rapid prototyping and validation of new control and navigation algorithms. In this context, all information processing is based on a personal computer, using the parallel communication interface to acquire and export real time data. In general, such an interface enables different kinds of actuators and transducers to be managed without additional electronic hardware.
parallel servers for I/O and compute intensive continuous media applications are difficult to develop. A server application comprises many threads located in different address spaces as well as files striped over mult...
详细信息
ISBN:
(纸本)0201485621
parallel servers for I/O and compute intensive continuous media applications are difficult to develop. A server application comprises many threads located in different address spaces as well as files striped over multiple disks located on different computers. the present contribution describes the construction of a continuous media server, the 4D beating heart slice server, based on a computer-aided parallelization tool (CAP) and on a library of parallel file system components enabling the combination of pipelined parallel disk access and processing operations. thanks to CAP, the presented archictecture is concisely described as a set of threads, operations located within the threads and flow of data and parameters (tokens) between operations. Continuous media applications are supported by allowing tokens to be suspended during a period of time specified by a user-defined function. Our target application, the 4D beating heart server supports the extraction of freely oriented slices from a 4D beating heart volume (one 3D volume per time sample). this server application requires both a high I/O throughput for accessing from disks the set of 4D sub-volumes (extents) intersecting the desired slices and a large amount of processing power to extract these slices and to resample them into the display grid. With a server configuration of 3 PCs and 24 disks, up to 7.3 slices can be delivered per second, i.e. 43 MB/s are continuously read from disks and 4.1 MB/s of slice parts are extracted, transfered to the client, merged, buffered and displayed. this performance is close to the maximal performance deliverable by the underlying hardware. the observed single stream server delay jitter varies between 0.6s (52% of maximal display rate) and 1.4s (92% of the maximal display rate). For the same resource utilization, the jitter is proportional to the number of streams that are accessed synchronously. the presented 4D beating heart application suggests that powerful continuous medi
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more difficult. the complex organization of today's multi-processors with several memory hierarch...
详细信息
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more difficult. the complex organization of today's multi-processors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely complex code that does not port to other architectures. this paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of managing parallelism and data locality from the user. We present innovative algorithms, based on the macro-dataflow model, for detecting data parallelism and efficiently executing data-parallel statements on shared-memory multiprocessors. We also describe how these algorithms can be implemented on clusters of SMPs.
In this paper, a boundary postprocessing technique is proposed to compute the discrete wavelet transform (DWT) near block boundaries. the basic idea is to take advantage of available lifting filterbank factorizations ...
详细信息
In this paper, a boundary postprocessing technique is proposed to compute the discrete wavelet transform (DWT) near block boundaries. the basic idea is to take advantage of available lifting filterbank factorizations to model the DWT as a Finite State Machine (FSM). the proposed technique can reduce the size of auxiliary buffers in block-based DWT implementations and reduce the communication overhead between adjacent blocks. Two new DWT system architectures, Overlap-State sequential and Split-and-Merge parallel, are presented using this technique. Experimental results show that, for the popular (9, 7) filters, the size of auxiliary buffers can be reduced by 42% and that the parallel algorithm is 30% faster than existing approaches.
the efficiency of HPF with respect to irregular applications is still largely unproven. While recent work has shown that a highly irregular hierarchical n-body force calculation method can be implemented in HPF, we ha...
详细信息
the efficiency of HPF with respect to irregular applications is still largely unproven. While recent work has shown that a highly irregular hierarchical n-body force calculation method can be implemented in HPF, we have found that the implementation contains inefficiencies which cause it to run up to a factor of three times slower than our hand-coded, explicitly parallel implementation. Our work examines these inefficiencies, determines that most of the extra overhead is due to a single aspect of the communication strategy, and demonstrates that fixing the communication strategy can bring the overheads of the HPF application to within 25% of those of the hand-coded version.
this paper presents two methods for solving a partial differential equation of the second order, with application to the well-known Poisson equation. these methods are aimed at making a high-speed hardware solver. the...
详细信息
this paper presents two methods for solving a partial differential equation of the second order, with application to the well-known Poisson equation. these methods are aimed at making a high-speed hardware solver. the solutions presented will be a part of a hardware device simulator which is called "Virtual Device". We present simulation results to compare the two methods for solving this equation. We start with an iterative method (Gauss-Seidel method) and then end with a direct method (LU method).
Image processing is often considered a good candidate for the application of parallelprocessing because of the large volumes of data and the complex algorithms commonly encountered. this paper presents a tutorial int...
详细信息
Image processing is often considered a good candidate for the application of parallelprocessing because of the large volumes of data and the complex algorithms commonly encountered. this paper presents a tutorial introduction to the field of parallel image processing. After introducing the classes of parallelprocessing a brief review of architectures for parallel image processing is presented. Software design for low-level image processing and parallelism in high-level image processing are discussed and an application of parallelprocessing to handwritten postcode recognition is described. the paper concludes with a look at future technology and market trends.
In this paper ne present a novel ATM switch called parallel-Tree Banyan Switch Fabric (PTBSF) that consists of parallel Banyans arranged in a tree topology. Packets enter at the topmost Banyan. Internal conflicts are ...
详细信息
ISBN:
(纸本)0818690143
In this paper ne present a novel ATM switch called parallel-Tree Banyan Switch Fabric (PTBSF) that consists of parallel Banyans arranged in a tree topology. Packets enter at the topmost Banyan. Internal conflicts are eliminated by using a conflict-free 3 x 4 switching element which distributes conflicting cells over different Banyans. thus, cell loss may occur only at the lowest Banyan. Increasing the number of Banyans leads to noticeable decrease in the cell loss rate. the switch can be engineered to provide arbitrarily high throughput and low cell loss rate without the use of input buffering nor cell pre-processing. the performance of the switch is evaluated analytically under uniform traffic load and by simulation under a variety of ATM traffic loads. Compared to other proposed architectures, the switch exhibited stable and excellent performance with respect to cell loss and switching delay for all studied conditions as required by ATM traffic sources. the advantages of PTBSF are modularity. regularity self-routing, low processing overhead, high throughput and robustness under a variety of ATM traffic conditions.
暂无评论