this paper describes the MM32k, a massively-parallel SIMD computer which is easy to program, high in performance, low in cost and effective for implementing highly parallel neural network architectures. the MM32k has ...
this paper describes the MM32k, a massively-parallel SIMD computer which is easy to program, high in performance, low in cost and effective for implementing highly parallel neural network architectures. the MM32k has 32768 bit serial processing elements, each of which has 512 bits of memory, and all of which are interconnected by a switching network. the entire system resides on a single PC-AT compatible card. It is programmed from the host computer using a C++ language class library which abstracts the parallel processor in terms of fast arithmetic operators for vectors of variable precision integers.
In this paper, we present a new scalar architecture for high-speed vector processing. Without using cache memory, the proposed architecture tolerates main memory access latency by introducing slide-windowed floating-p...
详细信息
A number of existing multiprocessors are based on the hypercube interconnection network. the popularity of the hypercube is due to its small communication diameter, which grows logarithmically withthe cube size, its ...
详细信息
A number of existing multiprocessors are based on the hypercube interconnection network. the popularity of the hypercube is due to its small communication diameter, which grows logarithmically withthe cube size, its fault-tolerant properties, and its modularity which makes it possible to build a larger cube from smaller subcubes. the star graph has been studied as a network topology for fault-tolerant parallel com puting. Unfortunately, the size of the network grows too sharply with n to be affordable for values of n larger than 7 or 8. We introduce a novel intercon nection network known as the incomplete star graph, which overcomes the above problem while retaining the most of the advantages of the star graph. We present the architecture of the incomplete star graph and compare its performance withthe full star as well as competing architectures such as the incomplete hypercube and arrangement graphs. We provide routing algorithms for both non-faulty and faulty incompletestar graphs, and study their performance.
High performance distributed computing systems require high performance communication systems. Distributed modeling and implementation of these communication systems is important. Toward this goal, the authors refine ...
详细信息
High performance distributed computing systems require high performance communication systems. Distributed modeling and implementation of these communication systems is important. Toward this goal, the authors refine the process-to-channel/sub agent/-to-process (PCP) model of asynchronous distributed communication. While the PCP model provides a versatile and succinct mechanism for specifying and comparing different types of channels, it is inherently centralized. the refined model presented here, the process-to-channel/sub agent/-to-channel/sub agent/-to-process (PCCP) communication model, is amenable to distributed modeling and implementation of channels. the usefulness of the PCCP model is demonstrated by presenting a distributed implementation of hierarchical F-channels.< >
We present a neural network simulation which we implemented on the massively parallel Connection Machine 2. In contrast to previous work, this simulator is based on biologically realistic neurons with nontrivial singl...
We present a neural network simulation which we implemented on the massively parallel Connection Machine 2. In contrast to previous work, this simulator is based on biologically realistic neurons with nontrivial single-cell dynamics, high connectivity with a structure modelled in agreement with biological data, and preservation of the temporal dynamics of spike interactions. We simulate neural networks of 16,384 neurons coupled by about 1000 synapses per neuron, and estimate the performance for much larger systems. Communication between neurons is identified as the computationally most demanding task and we present a novel method to overcome this bottleneck. the simulator has already been used to study the primary visual system of the cat.
Recent physiological research has shown that synchronization of oscillatory responses in striate cortex may code for relationships between visual features of objects. A VLSI circuit has been designed to provide rapid ...
Recent physiological research has shown that synchronization of oscillatory responses in striate cortex may code for relationships between visual features of objects. A VLSI circuit has been designed to provide rapid phase-locking synchronization of multiple oscillators to allow for further exploration of this neural mechanism. By exploiting the intrinsic random transistor mismatch of devices operated in subthreshold, large groups of phase-locked oscillators can be readily partitioned into smaller phase-locked groups. A multiple target tracker for binary images is described utilizing this phase-locking architecture. A VLSI chip has been fabricated and tested to verify the architecture. the chip employs Pulse Amplitude Modulation (PAM) to encode the output at the periphery of the system.
Progress has been made in computational implementation of speech production based on physiological data. An inverse dynamics model of the speech articulator's musculo-skeletal system, which is the mapping from art...
Progress has been made in computational implementation of speech production based on physiological data. An inverse dynamics model of the speech articulator's musculo-skeletal system, which is the mapping from articulator trajectories to electromyographic (EMG) signals, was modeled using the acquired forward dynamics model and temporal (smoothness of EMG activation) and range constraints. this inverse dynamics model allows the use of a faster speech motor control scheme, which can be applied to phoneme-to-speech synthesis via musclo-skeletal system dynamics, or to future use in speech recognition. the forward acoustic model, which is the mapping from articulator trajectories to the acoustic parameters, was improved by adding velocity and voicing information inputs to distinguish acoustic parameter differences caused by changes in source characteristics.
Window-based parallelarchitectures are here considered as target structures for the computation of low and medium level image processingalgorithms. their definition stems from a general reformulation of algorithms, ...
详细信息
this paper proposes a novel approach to program development for highly parallelarchitectures, primarily as far as debugging is concerned. the visual nature of the debugging stage, when dealing with image-processing a...
详细信息
the Symposium materials contain 118 papers on new developments in parallelprocessing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, an...
详细信息
ISBN:
(纸本)0818626720
the Symposium materials contain 118 papers on new developments in parallelprocessing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, and distributed systems are among the main topics covered.
暂无评论