Memory has always been a major factor in determining the cost of a computer system. Many schemes have been proposed for reducing memory cost without degrading system performance or increasing system cost or complexity...
详细信息
Memory has always been a major factor in determining the cost of a computer system. Many schemes have been proposed for reducing memory cost without degrading system performance or increasing system cost or complexity significantly. This paper presents a particular data type that may have been used occasionally by programmers who have had to simulate floating-point hardware by software. This new data type is useful in large scientific problems and may be able to serve as a replacement for floating-point data type on special-purpose processors. Its hardware implementation on orthogonal and pipeline processors is discussed in detail and the implications of these implementations for a programming language (APL) (Iverson"s language) are examined.
In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system *** the same time,it is difficult for matrix operations to achieve flex...
详细信息
In the case of massive data,matrix operations are very computationally intensive,and the memory limitation in standalone mode leads to the system *** the same time,it is difficult for matrix operations to achieve flexible switching between different requirements when implemented in *** address this problem,this paper proposes a matrix operation accelerator based on reconfigurable arrays in the context of the application of recommender systems(RS).Based on the reconfigurable array processor(APR-16)with reconfiguration,a parallelized design of matrix operations on processing element(PE)array is realized with *** experimental results show that,compared with the proposed central processing unit(CPU)and graphics processing unit(GPU)hybrid implementation matrix multiplication framework,the energy efficiency ratio of the accelerator proposed in this paper is improved by about 35×.Compared with blocked alternating least squares(BALS),its the energy efficiency ratio has been accelerated by about 1×,and the switching of matrix factorization(MF)schemes suitable for different sparsity can be realized.
作者:
KOOTSEY, JMDUKE UNIV
MED CTR NATL BIOMED SIMULAT RESOURCE DURHAM NC 27710 USA
Developments in computer hardware and software are making significant improvements in the availability of simulation for biomedical researchers. This paper reviews past and present techniques for digital computer simu...
详细信息
Developments in computer hardware and software are making significant improvements in the availability of simulation for biomedical researchers. This paper reviews past and present techniques for digital computer simulation and looks at improvements likely in the near future. In the area of hardware, personal computers are making computing and simulation more widely available and at the same time, supercomputers and special-purpose numerical processors are making it possible to solve larger problems. Software developments for simulation are reducing the time, effort and special skills required to produce a simulation program. A new hierarchical linker is proposed to make it easy to synthesize a global model by combining existing submodels. In the more distant future, computer models may be constructed graphically and with the assistance of intelligent programs capable of analysis and information retrieval.
The CORDIC iteration is applied to several Fourier transform algorithms. The number of operations is found as a function of transform method and radix representation. Using these representations, several hardware conf...
详细信息
The CORDIC iteration is applied to several Fourier transform algorithms. The number of operations is found as a function of transform method and radix representation. Using these representations, several hardware configurations are examined for cost, speed, and complexity tradeoffs. A new, especially attractive FFT computer architecture is presented as an example of the utility of this technique. Compensated and modified CORDIC algorithms are also developed.
Matrix multiplication algorithms have been proposed for VLSI array processors. Random defects in the silicon wafer and fabrication errors render processors and data paths in the array faulty, and may cause the algorit...
详细信息
Matrix multiplication algorithms have been proposed for VLSI array processors. Random defects in the silicon wafer and fabrication errors render processors and data paths in the array faulty, and may cause the algorithm to fail despite a significant number of nonfaulty processors. This correspondence presents a robust VLSI array processor for matrix multiplication. The array is driven by a host computer as a peripheral and the I/O bandwidth required to drive the array is a constant, independent of the problem size. Multiplication of two n x n matrices requires O(n) processors and has a time complexity of O(n2) cydes.
In this paper we describe a new observing system which is currently nearing completation at the Mount Wilson Observatory. This system has been designed to obtain daily measurements of solar photospheric and subphotosp...
详细信息
In this paper we describe a new observing system which is currently nearing completation at the Mount Wilson Observatory. This system has been designed to obtain daily measurements of solar photospheric and subphotospheric rotational velocities from the frequency splitting of non-radial solar p-mode oscillations of moderate to high degree (i.e. l > 150). The completed system will combine a 244 × 248 pixel CID camera with a high-speed floating point array processor, a 32-bit minicomputer, and a large-capacity disc storage system. We are integrating these components into the spectrograph of the 60-foot solar tower telescope at Mount Wilson in order to provide a facility which will be dedicated to the acquisition of oscillation data.
The evolution of chip architecture is discussed in this paper. Then MPP SoC architectures according to three kinds of computing paradigms are analyzed. Based on these discussions and analyses, array processor architec...
详细信息
The evolution of chip architecture is discussed in this paper. Then MPP SoC architectures according to three kinds of computing paradigms are analyzed. Based on these discussions and analyses, array processor architecture for unified change is presented, which could implement the simplification, effectiveness and versatility of both data level and non-data level parallel algorithm's programming.
The design and construction of a new image processing system, CLIP7, is described, together with the design and operation of the custom integrated circuit on which it is based.
The design and construction of a new image processing system, CLIP7, is described, together with the design and operation of the custom integrated circuit on which it is based.
A 167-processor computational platform consists of an array of simple programmable processors capable of per-processor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three...
详细信息
A 167-processor computational platform consists of an array of simple programmable processors capable of per-processor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three 16 KB shared memories;and is implemented in 65 nm CMOS. All processors and shared memories are clocked by local fully independent, dynamically haltable, digitally-programmable oscillators and are interconnected by a configurable circuit-switched network which supports long-distance communication. Programmable processors occupy 0.17 mm(2) and operate at a maximum clock frequency of 1.2 GHz at 1.3 V. At 1.2 V, they operate at 1.07 GHz and consume 47.5 mW when 100% active, resulting in an energy dissipation of 44 pJ per operation. At 0.675 V, they operate at 66 MHz and consume 608 mu W when 100% active, resulting in a total energy dissipation of 9.2 pJ per ALU or MAC operation.
The use of the Wiener filter is proposed for temporal filtering of nuclear medicine dynamic studies. This filter adapts to the signal and noise levels of each pixel activity curve in a dynamic study to produce an “op...
详细信息
The use of the Wiener filter is proposed for temporal filtering of nuclear medicine dynamic studies. This filter adapts to the signal and noise levels of each pixel activity curve in a dynamic study to produce an “optimal” suppression of noise, while maintaining the signal content of the curve. The filter is derived to be a simple function of the power spectrum of the time-activity curve. Examples of its use for temporaly filtering gated blood-pool studies for cine viewing and functional image formation are shown.
暂无评论