In this paper a new Cellular Neural Network (CNN) based visual algorithm for welding processes is proposed. The idea described in [1] can be used in processes, whose welding direction has a constant orientation well k...
详细信息
ISBN:
(纸本)9781424438952
In this paper a new Cellular Neural Network (CNN) based visual algorithm for welding processes is proposed. The idea described in [1] can be used in processes, whose welding direction has a constant orientation well known a priori. The algorithm proposed in the following is omnidirectional in the sense that it does not depend on the welding direction. This fact enables closed loop control systems for welding processes with curved seeds. On Eye-RIS systems [2] processing times of about 110 mu s are achievable for both acquisition and evaluation of full frame images.
This paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as...
详细信息
ISBN:
(纸本)1581137427
This paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. Numerous modifications of the first-generation of the architecture have made a scalable computation and communication intensive architecture capable of extracting parallelisms of fine grain in instruction level. Many algorithms and the whole Digital Video Broadcasting base-band receiver as well, have been mapped onto the second architecture with impressing performance. The mapping of a Reed-Solomon decoder proposed in this paper highly parallelizes all of its sub-algorithms, including Syndrome Computation. Berlekamp Algorithm, Chem Search, and Error Value Computation, in a simd fashion. The mapping is tested on a cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures. The decoding speed of the RS (255,239,16) decoder using two different methods of GF multiplication can be 1.319Gbps and 2.534Gbps, respectively. Furthermore, since there is no functionality specifically tailored to Reed-Solomon decoder, the result has demonstrated the capability of MorphoSys architecture to extracting Instruction Level Parallelism from streamed applications.
This paper presents a unified design flow that aims at accelerating parallelizable data-intensive applications in the context of ubiquitous computing. This contribution relies on the JubiTool: a set of integrated tool...
详细信息
ISBN:
(纸本)9781424437320
This paper presents a unified design flow that aims at accelerating parallelizable data-intensive applications in the context of ubiquitous computing. This contribution relies on the JubiTool: a set of integrated tools (JubiSplitter, JubiCompiler), allowing respectively to extract and compile parallelizable parts of applications described in a Java extended language called Jubi. By appending hardware directives to a software agent description, the inherent flexibility of software is combined with the runtime performance of a hardware execution. In the case of typical Perplexus applications such as a biologically plausible neural network simulator, this contribution takes profit of the intrinsic property of the Perplexus Ubichip in terms of parallelism resulting in an expected speedup of one order of magnitude. Finally, we show that this original flow allowing HW acceleration can be modified to support other types of distributed platforms.
In this paper, we discuss and evaluate about a grain size of the PE of a matrix operation specific architecture with fused multiply add (FMA) units, RapidMatriX, on FPGAs. Recent FPGAs have many DSP blocks which are h...
详细信息
ISBN:
(纸本)9781479943050
In this paper, we discuss and evaluate about a grain size of the PE of a matrix operation specific architecture with fused multiply add (FMA) units, RapidMatriX, on FPGAs. Recent FPGAs have many DSP blocks which are high-performance arithmetic units. Hereby, implementing functional units for matrix operation to array structure of the RapidMatriX, we propose to use DSP blocks efficiently by increasing grain size of FMA unit. We implement the RapidMatriX using the refined PEs on an FPGA. In addition, we evaluate the clock frequencies and the clock cycles of calculation. As a result, throughput of the PE for 4x4 matrix FMA is 3.14 times in comparison with the original PEs of scalar FMA for 8 x 8 matrix multiplication.
This paper presents a high-throughput and reconfigurable processor for fast Fourier transformation (FFT) processing based on SDR methodology. It adopts application specific instruction-set (ASIP) and single instructio...
详细信息
ISBN:
(纸本)9781628411867
This paper presents a high-throughput and reconfigurable processor for fast Fourier transformation (FFT) processing based on SDR methodology. It adopts application specific instruction-set (ASIP) and single instruction multiple data (simd) architecture to exploit the parallelism of butterfly operations in FFT algorithm. Moreover, a novel 3-dimension multi-bank memory is proposed for parallel conflict-free accesses. The overall throughput and power-efficiency are greatly enhanced by parallel and streamline processing. A test chip supporting 64 similar to 2048-point FFT is setup for experiment. Logic synthesis reveals a maximum clock frequency of 500MHz and an area of 0.49 mm(2) for the processor's logic using a low power 45-nm technology, and the dynamic power estimation is about 96.6mW. Compared with previous works, our FFT ASIP achieves a higher energy-efficiency with relative low area cost.
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5 x 5 x 64-bit mat...
详细信息
ISBN:
(纸本)9798350396249
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5 x 5 x 64-bit matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1 600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of Keccak-f[1 600] in RISC-V based processors through custom vector extensions on 32-bit and 64-bit architectures. We analyze the Keccakf[1 600] permutation, composed of five different step mappings, and propose ten custom vector instructions to speed up the computation. We realize these extensions in a simd processor described in System Verilog. We compare the performance of our designs to existing architectures based on vectorized application-specific instruction set processors (ASIP). We show that our designs outperform all related work in throughput due to our carefully selected custom vector instructions.
A novel complementary-metal-oxide-semiconductor (CMOS) processor labelled quantum-circuit processor (QCP) for the high-performance emulation of quantum computing is presented. The QCP performs the emulation of the cal...
详细信息
A novel complementary-metal-oxide-semiconductor (CMOS) processor labelled quantum-circuit processor (QCP) for the high-performance emulation of quantum computing is presented. The QCP performs the emulation of the calculation per-formed in the quantum circuit by simple matrix calculations based on single-instruction-stream-multiple-data-stream (simd) parallel processing, Using the parallel operation of an enormous number of devices in LSI, it executes quantum algorithms at a speed comparable to that of the quantum computer. A 5-qubit processor was implemented using a programmable logic device (PLD), and the quantum Fourier transformation was demonstrated by this processor.
Real-time monitoring of laser beam welding (LBW) has increasingly gained importance in several manufacturing processes ranging from automobile production to precision mechanics. In the latter, a novel algorithm for th...
详细信息
Real-time monitoring of laser beam welding (LBW) has increasingly gained importance in several manufacturing processes ranging from automobile production to precision mechanics. In the latter, a novel algorithm for the real-time detection of spatters was implemented in a camera based on cellular neural networks. The latter can be connected to the optics of commercially available laser machines leading to real-time monitoring of LBW processes at rates up to 15 kHz. Such high monitoring rates allow the integration of other image evaluation tasks such as the detection of the full penetration hole for real-time control of process parameters.
暂无评论