Convolutional Neural Networks (CNNs) are widely used in image classification tasks and have achieved significant performance. They have different applications with great success, especially in the medical field. The c...
详细信息
This work explores avenues and target areas for optimizing FPGA-based control hardware for experiments conducted on superconducting quantum computing systems and serves as an introduction to some of the current resear...
This work explores avenues and target areas for optimizing FPGA-based control hardware for experiments conducted on superconducting quantum computing systems and serves as an introduction to some of the current research at the intersection of classical and quantum computing hardware. With the promise of building larger-scale error-corrected quantum computers based on superconducting qubit architecture, innovations to room-temperature control electronics are needed to bring these quantum realizations to fruition. The QICK (Quantum Instrumentation Control Kit) is one leading experimental FPGA-based implementations. However, its integration into other experimental quantum computing architectures, especially those using superconducting radiofrequency (SRF) cavities, is largely unexplored. We identify some key target areas for optimizing control electronics for superconducting qubit architectures and provide some preliminary results to the resolution of a control pulse waveform. With optimizations targeted at 3D superconducting qubit setups, we hope to bring to light some of the requirements in classical computational methodologies to bring out the full potential of this quantum computing architecture, and to convey the excitement of progress in this research.
Advancements in quantum computing underscore the critical need for sophisticated qubit readout techniques to accurately discern quantum states. This abstract presents our research intended for optimizing readout pulse...
详细信息
ISBN:
(数字)9798331541378
ISBN:
(纸本)9798331541385
Advancements in quantum computing underscore the critical need for sophisticated qubit readout techniques to accurately discern quantum states. This abstract presents our research intended for optimizing readout pulse fidelity for 2D and 3D Quantum Processing Units (QPUs), the latter coupled with Superconducting Radio Frequency (SRF) cavities. Focusing specifically on the application of the Least Mean Squares (LMS) adaptive filtering algorithm, we explore its integration into the FPGA-based control systems to enhance the accuracy and efficiency of qubit state detection by improving Signal-to-Noise Ratio (SNR). Implementing the LMS algorithm on the Zynq UltraScale+ RFSoC Gen 3 devices (RFSoC 4x2 FPGA and ZCU216 FPGA) using the Quantum Instrumentation Control Kit (QICK) open-source platform, we aim to dynamically test and adjust the filtering parameters in real-time to characterize and adapt to the noise profile presented in quantum computing readout signals. Our preliminary results demonstrate the LMS filter's capability to maintain high readout accuracy while efficiently managing FPGA resources. These findings are expected to contribute to developing more reliable and scalable quantum computing architectures, highlighting the pivotal role of adaptive signal processing in quantum technology advancements.
The emerging three-dimensional integrated circuits (3D ICs) offer a promising solution to mitigate the barriers of interconnect scaling in modern systems. In order to exploit the intrinsic capability of reducing the w...
详细信息
The emerging three-dimensional integrated circuits (3D ICs) offer a promising solution to mitigate the barriers of interconnect scaling in modern systems. In order to exploit the intrinsic capability of reducing the wire length in 3D ICs, 3D NoC-Bus Hybrid mesh architecture was proposed. Besides its various advantages in terms of area, power consumption, and performance, this architecture has a unique and hitherto previously unexplored way to implement an efficient system-wide monitoring network. In this paper, an integrated low-cost monitoring platform for 3D stacked mesh architectures is proposed which can be efficiently used for various system management purposes. The proposed generic monitoring platform called ARB-NET utilizes bus arbiters to exchange the monitoring information directly with each other without using the data network. As a test case, based on the proposed monitoring platform, a fully congestion-aware adaptive routing algorithm named AdaptiveXYZ is presented taking advantage from viable information generated within bus arbiters. Our extensive simulations with synthetic and real benchmarks reveal that our architecture using the AdaptiveXYZ routing can help achieving significant power and performance improvements compared to recently proposed stacked mesh 3D NoCs.
Three-dimensional (3D) integration is a viable design paradigm to overcome the existing interconnect bottleneck in integrated systems and enhance system power/performance characteristics. In order to exploit the intri...
详细信息
Three-dimensional (3D) integration is a viable design paradigm to overcome the existing interconnect bottleneck in integrated systems and enhance system power/performance characteristics. In order to exploit the intrinsic capability of reducing the wire length in 3D ICs, stacked mesh 3D NoC architecture was proposed. However, this architecture suffers from naive and straightforward hybridization between NoC and bus media. In this paper, an efficient hybridization scheme is presented to enhance system performance, power consumption, and area of stacked mesh 3D NoC architectures. By utilizing a routing rule called LastZ the proposed hybridization scheme offers many advantages investigated in detail to emphasize the significant achievements. Our extensive simulations with synthetic and real benchmarks, including an integrated videoconference application show that compared to a typical 3D NoC-Bus Hybrid Mesh architecture, our hybridization scheme achieves significant power, performance, and area improvements.
Many researchers and vendors are exploiting the increasing number of transistors to build chip multiprocessors (CMPs) by partitioning a chip into multiple simple ILP cores. As in traditional multiprocessors, CMPs extr...
详细信息
Many researchers and vendors are exploiting the increasing number of transistors to build chip multiprocessors (CMPs) by partitioning a chip into multiple simple ILP cores. As in traditional multiprocessors, CMPs extr...
详细信息
Many researchers and vendors are exploiting the increasing number of transistors to build chip multiprocessors (CMPs) by partitioning a chip into multiple simple ILP cores. As in traditional multiprocessors, CMPs extract thread-level parallelism (TLP) from programs by running multiple independent program segments, i.e., threads, in parallel. Currently CMPs are used widely in high performance servers, and even in embeddedsystems. In this paper, we present an extension of the OpenMP shared directive for performance optimization on BlackFin 561 (ADSPBF561) dual core processors. In order to support memory consistency between multiple cores, many architectures have been proposed. On the dual core processor, like ADSP-BF561, each core has its own private L1 cache, and a shared L2 cache. In order to execute multithreaded parallel programs, we need to consider carefully where to allocate shared variables on targeted memory architecture. We could improve the speedup by up to 107% and reduce the energy consumption by up to 108% in our measured benchmarks with respect to no use of our extension.
Managing the energy consumption of embeddedsystems has become a major problem with the increasing demand for portable electronic devices. This paper proposes a multi-bank memory architecture as a solution to decrease...
详细信息
Managing the energy consumption of embeddedsystems has become a major problem with the increasing demand for portable electronic devices. This paper proposes a multi-bank memory architecture as a solution to decrease the static energy cost in memory. We set up the equations ruling the optimization problem for decreasing the memory static energy cost, analyze the impact of different parameters on the energy cost and finally present some case study results
Validation of programmable architectures, consisting of processor cores, coprocessors, and memory subsystems, is one of the major bottlenecks in current system-on-chip design methodology. A critical challenge in valid...
详细信息
Validation of programmable architectures, consisting of processor cores, coprocessors, and memory subsystems, is one of the major bottlenecks in current system-on-chip design methodology. A critical challenge in validation of such systems is the lack of a golden reference model. Traditional validation techniques employ different reference models depending on the abstraction level and verification task (e.g., functional simulation or property checking), resulting in potential inconsistencies between multiple reference models. This paper presents a validation methodology that uses an architecture description language (ADL) based specification as a golden reference model for validation of programmable architectures, and generation of executable models such as simulators and hardware prototypes. We present a validation framework that uses the generated hardware as a reference model to verify the hand-written implementation using a combination of symbolic simulation and equivalence checking. We also present functional coverage based test generation techniques for validation of pipelined processor architectures. Finally, the generated simulator and hardware models are also used for early exploration of programmable architectures.
Recent advances on language based software toolkit generation enables performance driven exploration of embeddedsystems by exploiting the application behavior. There is a need for an automatic generation of hardware ...
详细信息
Recent advances on language based software toolkit generation enables performance driven exploration of embeddedsystems by exploiting the application behavior. There is a need for an automatic generation of hardware to determine the required silicon area, clock frequency, and power consumption of the candidate architectures. In this paper, we present a language based exploration framework that automatically generates synthesizable RTL models for pipelined processors. Our framework allows varied micro-architectural modifications, such as, addition of pipeline stages, pipeline paths, opcodes and new functional units. The generated RTL is synthesized to determine the area, power, and clock frequency of the modified architectures. Our exploration results demonstrate the power of reuse in composing heterogeneous architectures using functional abstraction primitives allowing for a reduction in the time for specification and exploration by at least an order of magnitude.
暂无评论