The Belle II experiment and the SuperKEKB collider are designed to operate under a higher luminosity compared to that of Belle for the improvement of rare $B$ meson decay study and new physics search. To break the bot...
详细信息
The Belle II experiment and the SuperKEKB collider are designed to operate under a higher luminosity compared to that of Belle for the improvement of rare $B$ meson decay study and new physics search. To break the bottleneck of bandwidth and to improve the stability in the operation of the Belle II data acquisition (DAQ) system, a new PCI-express-based readout system has been developed. The new system includes a PCI-express-based high-speed readout board (PCIe40), which was originally developed for the upgrades of the LHCb and ALICE experiments, the PCIe40 firmware, the slow control, and readout software running on a readout PC. The new readout system's commissioning with most of the Belle II subdetectors has been performed, and the readout upgrade is complete for the particle-identification detectors and the neutral kaon and muon detector in Belle II, which has been operating stably with the new system in the beam collision "physics runs." The results of the commissioning and the performance of the global DAQ operation will be reported.
Fractional Motion Estimation (FME) is an important part of the H.264/AVC video encoding standard. The algorithm can significantly increase the compression ratio of video encoders while improving video quality. However...
详细信息
Fractional Motion Estimation (FME) is an important part of the H.264/AVC video encoding standard. The algorithm can significantly increase the compression ratio of video encoders while improving video quality. However, it is computationally expensive and can consist of over 45% of the total motion estimation runtime. To maximize the performance and utilization of FME implementations on field-programmable gate arrays (FPGAs), one needs to effectively exploit the inherent parallelism in the algorithm. In this work, we explore two approaches to FME algorithm parallelization in order to effectively increase the processing power of the computing hardware. We call the first method vertical scaling and the second horizontal scaling. We implemented six scaled FME designs on a Xilinx XC5VLX85T (Virtex-5) FPGA. We found that scaling vertically within a 4 x 4 sub-block is more efficient than scaling horizontally across several sub-blocks. As a result, we were able to achieve higher video resolutions at lower hardware resource cost. In particular, it is shown that the best vertically scaled design can achieve 30 fps of QSXGA video with 4 reference frames with only 25.5 K LUTS and 28.7 K registers. (C) 2011 Elsevier B.V. All rights reserved.
The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for i...
详细信息
The development of assistive devices for automated sound recognition is an important field of research and has been receiving increased attention. However, there are still very few methods specifically developed for identifying environmental sounds. The majority of the existing approaches try to adapt speech recognition techniques for the task, usually incurring high computational complexity. This paper proposes a sound recognition method dedicated to environmental sounds, designed with its main focus on embedded applications. The pre-processing stage is loosely based on the human hearing system, while a robust set of binary features permits a simple k-NN classifier to be used. This gives the system the capability of in-field learning, by which new sounds can be simply added to the reference set in real-time, greatly improving its usability. The system was implemented in an FPGA based platform, developed in-house specifically for this application. The design of the proposed method took into consideration several restrictions imposed by the hardware, such as limited computing power and memory, and supports up to 12 reference sounds of around 5.3 s each. Experimental results were performed in a database of 29 sounds. Sensitivity and specificity were evaluated over several random subsets of these signals. The obtained values for sensitivity and specificity, without additional noise, were, respectively, 0.957 and 0.918. With the addition of +6 dB of pink noise, sensitivity and specificity were 0.822 and 0.942, respectively. The in-field learning strategy presented no significant change in sensitivity and a total decrease of 5.4% in specificity when progressively increasing the number of reference sounds from 1 to 9 under noisy conditions. The minimal signal-to-noise ration required by the prototype to correctly recognize sounds was between -8 dB and 3 dB. These results show that the proposed method and implementation have great potential for several real life applicatio
Medical imaging using different modalities has many problems. The main ones are low informativeness, various distortion noises, and a large amount of information. Fusion, denoising, and visual data compression are use...
详细信息
Medical imaging using different modalities has many problems. The main ones are low informativeness, various distortion noises, and a large amount of information. Fusion, denoising, and visual data compression are used to solve them in practice. Discrete wavelet transform is one way to implement various fusion, denoising, and compression methods for 2D and 3D medical image processing. Medical imaging systems produce increasingly accurate images with scanning technology and digital devices development. These images have improved quality using both higher spatial resolutions and color bit-depth. Processing a large volume of medical imaging data requires considerable resources and processing time. Modern wavelet-based devices for medical image processing do not meet the current performance demand. Hardware accelerators are being designed to solve this problem. This paper proposes new (field-programmablegate array) FPGA accelerators using wavelet processing (WP) with scaled filter coefficients (SFC) and parallel computing in residue number system (RNS) to improve the performance of high-quality 3D medical image WP systems. The computational complexity is reduced using the developed WP method with SFC and the proposed wavelet filter coefficients scaling algorithm. Parallel computing is organized in RNS using moduli sets of a particular type. Hardware implementation of 3D medical image WP using the proposed FPGA accelerators increases device performance by 2.89-3.59 times, increasing the hardware resources by 1.18-3.29 times compared to state-of-the-art solutions. The device performance improvement is achieved while maintaining high-quality 3D medical image processing in peak signal-to-noise ratio terms.
Content-addressable memory (CAM) is a massively parallel searching device that returns the address of a given search input in one clock cycle. field-programmablegate array (FPGA)-based CAMS are becoming popular due t...
详细信息
Content-addressable memory (CAM) is a massively parallel searching device that returns the address of a given search input in one clock cycle. field-programmablegate array (FPGA)-based CAMS are becoming popular due to their applications in the latest networking systems, e.g., software-defined networks (SDNs) leading to upcoming 5G networks. Ternary CAM (TCAM) implements a routing table in a network router to classify and forward data packets where don't care bits (X-bits) correspond to multiple addresses. FPGAs do not have a hard-core CAM, although it is a prime element in networking applications. This paper serves as a comprehensive survey on FPGA-based CAM/TCAMs implemented using block random-access memory (BRAM), lookup table RAM (LUTRAM), and flip-flops (FFs). BRAM-based TCAM suffers from the pre-processing of mapping data, requires the data to be in a specific order in some cases, and has a large SRAM/TCAM bit ratio. LUTRAM-based CAM/TCAM suffers from wide bit-wise ANDing, high routing complexity, but has a small SRAM/TCAM bit ratio of 14 compared to 16 in the case of BRAM-based TCAM. Shallow and wide RAM blocks are required to implement large-size RAM-based TCAMs (BRAM-based and LUTRAM-based TCAMs). FF-based TCAMs use FFs as their memory elements and have reduced hardware costs per TCAM bit. However, due to the routing complexity, it suffers from scalability and a large amount of power consumption. The update latency of BRAM-based TCAM and LUTRAM-based TCAM is proportional to the depth of BRAM and LUTRAM, respectively. However, FF-based CAM updates in 1 or 2 clock cycles depending on the availability of input/output pins on target FPGA. (C) 2021 Published by Elsevier B.V.
Contemporary quantum computers face many critical challenges that limit their usefulness for practical applications. A primary limiting factor is classical-to-quantum (C2Q) data encoding, which requires specific circu...
详细信息
Contemporary quantum computers face many critical challenges that limit their usefulness for practical applications. A primary limiting factor is classical-to-quantum (C2Q) data encoding, which requires specific circuits for quantum state initialization. The required state initialization circuits are often complex and violate decoherence constraints, particularly for I/O intensive applications. Existing Noisy Intermediate-Scale Quantum (NISQ) devices are noise-sensitive and have low quantum bit (qubit) counts, thus limiting the applicability of C2Q circuits for encoding large and realistic datasets. This has made the study of complete and realistic circuits that include data encoding challenging and has also led to a heavy dependency on costly and resource-intensive simulations on classical platforms. In this work, we propose a cost-effective, classical-hardware-accelerated framework for realistic and complete emulation of quantum algorithms. The emulation framework incorporates components for the critical C2Q data encoding process, as well as architectures for quantum algorithms such as the quantum Haar transform (QHT). The framework is used to investigate optimizations for C2Q and QHT algorithms, and the corresponding optimized quantum circuits are presented. The framework is implemented on a High-Performance Reconfigurable Computer (HPRC) which emulates the proposed QHT circuits combined with proposed C2Q data encoding methods. For performance benchmarks, CPU-based emulations and simulations on a state-of-the-art quantum computing simulator are also carried out. Results show that the proposed hardware-accelerated emulation framework is more efficient in terms of speed and scalability compared to CPU-based emulation and simulation.
In this paper, two real-time architectures of medium access techniques useful for future generation of wireline and wireless communication systems are presented. One architecture is based on discrete cosine transform ...
详细信息
In this paper, two real-time architectures of medium access techniques useful for future generation of wireline and wireless communication systems are presented. One architecture is based on discrete cosine transform (DCT), while the second approach implements a filter-bank multi-carrier (FBMC) system. A comparative analysis, in terms of resource consumption, performance, and precision, is shown. The comparison considers a floating-point model, a fixed-point model, and experimental tests. These models make it possible to evaluate the effect of the fixed-point precision in the implementation and, in turn, to verify the correctness of the developed architecture. The simulation models and the experimental tests have been carried out in different practical environments in order to achieve a further analysis. The two proposed architectures have been implemented on a field-programmablegate array (FPGA) device. Furthermore, the architectures have been included as advanced peripherals in a system-on-chip, which also integrates a soft microprocessor to monitor the whole system and manage the data transfers. As a communication scenario, the proposed architectures have been particularized to operate in real time while meeting all timing requirements de fined by a broadband power line communications standard. For that case, the system has achieved a desired transmission rate of 62.5 Ms/s at the converters, providing mean squared errors, at the output for an ideal channel, below 3 .10(-5) for both the DCT and FBMC approaches, whereas each transmitter/receiver requires around 50% of the DSP cells available in the Xilinx XC6VLX240T FPGA, the most demanded resource in the device.
The increasing pervasiveness of control systems used in robotic and automotive applications requires the installation of a growing number of sensors and actuators. In parallel to the downsizing of all the components, ...
详细信息
The increasing pervasiveness of control systems used in robotic and automotive applications requires the installation of a growing number of sensors and actuators. In parallel to the downsizing of all the components, new techniques for tracing versatile printed circuit boards (PCBs) are emerging: a 3-D molded interconnection device, for example, creates the opportunity to reduce up to 75% of weight by combining a single-layer PCB with mechanical parts. Getting rid of unnecessary wires, hence, becomes indispensable, and new on-board interfaces with fewer pins must be designed. This article proposes a novel encoding scheme and the corresponding interface that reduces the number of wires between automotive Ethernet (100BASE-T1) MAC and PHY down to 2 and corrects up to 37.8% of single-bit errors. As this interface can be clocked at 33.33 MHz, it does not require differential transmitters, receivers, or any other special block, and for this reason, it can be easily implemented on a small-sized field-programmablegate array.
Biological organisms are among the most robust systems known to man. Their robustness is based on a set of processes which cannot be adapted directly to the world of silicon but can provide an inspiration for the desi...
详细信息
Biological organisms are among the most robust systems known to man. Their robustness is based on a set of processes which cannot be adapted directly to the world of silicon but can provide an inspiration for the design of robust circuits. This paper introduces a multiplexer-based fieldprogrammablegate Array (FPGA) which we made capable of self-test and self-repair using an approach loosely based on biological mechanisms at the cellular level. The system is designed to provide on-line self-test and self-repair using a completely distributed system and a minimal amount of additional logic.
暂无评论