It is well observed that cryptographic applications have great challenges in guaranteeing high security as well as high throughput. Artificial neural network (ANN)-based chaotic true random number generator (TRNG) str...
详细信息
It is well observed that cryptographic applications have great challenges in guaranteeing high security as well as high throughput. Artificial neural network (ANN)-based chaotic true random number generator (TRNG) structure has not been unprecedented in current literature. This paper provides a novel type of high-speed TRNG based on chaos and ANN implemented in a Xilinx field-programmablegate array (FPGA) chip. The paper consists of two main parts. In the first part, chaos analyses of Pehlivan-Uyaroglu_2010 chaotic system (PUCS) have been accomplished to prove that PUCS operates in chaotic regime. So PUCS can be an efficient alternative to the entropy source for classical TRNGs. In the second part, the hardware design of the proposed TRNG has been created using VHDL in Xilinx platform. As a result, the implemented TRNG offers throughput up to 115.794 Mbps. Besides, the generated random numbers have been tested with the FIPS 140-1 and NIST 800.22 test suites. The high quality of generated true random numbers have been confirmed by passing all randomness tests. The results have shown that the proposed system can provide not only high throughput but also high quality random bit sequences for a wide variety of embedded cryptographic applications.
In this paper, a scalable scheme, configurable via register-transfer level parameters, for full register bypassing in a modern embedded processor architecture, termed ByoRISC, is presented. The register bypassing spec...
详细信息
In this paper, a scalable scheme, configurable via register-transfer level parameters, for full register bypassing in a modern embedded processor architecture, termed ByoRISC, is presented. The register bypassing specification is parameterized regarding the number of homogeneous register file read and write ports and the number of pipeline stages of the processor. The performance characteristics (cycle time, chip area) of the proposed technique have been evaluated for FPGA target implementations of the synthesizable ByoRISC model. It is proved that, a full bypassing network is a viable solution for the elimination of data hazards when servicing instructions with multiple read and write operands. While the maximum clock frequency is reduced by 17.9% in average, when using partial versus full forwarding, the positive effect of custom computation eliminates this effect by providing cycle speedups of 3.9x to 5.5x and corresponding execution time speedups for a ByoRISC testbed processor of 3.6x. Individual application speedups of up to 9.4x have also been obtained. (C) 2009 Elsevier B.V. All rights reserved.
field-programmable gate arrays are susceptible to radiation-induced single event upsets. These are commonly dealt with using triple modular redundancy (TMR) and module-based configuration memory error recovery (MER). ...
详细信息
field-programmable gate arrays are susceptible to radiation-induced single event upsets. These are commonly dealt with using triple modular redundancy (TMR) and module-based configuration memory error recovery (MER). By triplicating components and voting on their outputs, TMR helps localise configuration memory errors, and by reconfiguring faulty components, MER swiftly corrects them. However, the order in which TMR voters are checked inevitably impacts the overall system reliability. In this study, the authors outline an approach for computing the reliability of TMR-MER systems that consist of finitely many components. They demonstrate that system reliability is improved when the more vulnerable components are checked more frequently than when they are checked in round-robin order. They propose a genetic algorithm for finding a voter checking schedule that maximises the reliability of TMR-MER systems. Results indicate that the mean time to failure (MTTF) of these systems can be increased by up to 400% when variable-rate voter checking (VRVC) is used instead of round robin. They show that VRVC achieves 15-23% increase in MTTF with a 10x reduction in checking frequency to reduce system power. They also found that VRVC detects errors 44% faster on average than round robin.
field-programmable gate arrays (FPGA's) are now widely used for the implementation of digital systems, acid many commercial architectures are available. Although the literature and data books contain detailed desc...
详细信息
field-programmable gate arrays (FPGA's) are now widely used for the implementation of digital systems, acid many commercial architectures are available. Although the literature and data books contain detailed descriptions of these architectures, there is very little information on how the high-level architecture was chosen, and no information on the circuit-level or physical design of the devices. This paper describes the high-level architectural design of a static-random-access memory programmable FPGA. A forthcoming Part II will address the circuit design issues through to the physical layout, The logic block and routing architecture of the FPGA was determined through experimentation with benchmark circuits and custom-built computer-aided design tools. The resulting logic block is an asymmetric tree of four-input lookup tables that are hard-wired together and a segmented routing architecture with a carefully chosen segment length distribution.
The paper deals with logic synthesis of lookup-table (LUT) based field-programmable gate arrays (FPGAs). Because each LUT can implement any k input Boolean function with the same area cost, the optimisation criterion ...
详细信息
The paper deals with logic synthesis of lookup-table (LUT) based field-programmable gate arrays (FPGAs). Because each LUT can implement any k input Boolean function with the same area cost, the optimisation criterion of literal count, generally used in other multi-level logic synthesis methods, is not suitable for LUT-based technologies. Therefore a new logic optimisation criterion is proposed, which trades off literals against support. Based on this criterion, five logic operations in logic optimisation are analysed, and made to evaluate the circuit cost in accordance with the target technology. Using these techniques of logic optimisation, a good starting point for technology mapping of LUT-based FPGAs has been obtained. In the technology mapping phase, LUT-directed decomposition is applied. Experimental results indicate that synthesised circuits are much smaller and more routable than the circuits synthesised by other tools.
Modern embedded systems provide a variety of functionality as operational modes, each corresponding to a mutually exclusive phase of operation. This paper provides a system level design methodology tailored for such m...
详细信息
Modern embedded systems provide a variety of functionality as operational modes, each corresponding to a mutually exclusive phase of operation. This paper provides a system level design methodology tailored for such multi-mode systems. By incorporating knowledge about the temporal behavior, it is possible to share hardware by means of partial reconfiguration on sophisticated fieldprogrammablegatearrays (FPGAs), and thus, reduce costs and improve performance. The presented methodology is based on an exploration model, which specifies the temporal behavior of the system functionality as well as the architectural characteristics of nowadays reconfigurable technology. We develop a symbolic encoding of this system specification, which enables unified system synthesis by applying sophisticated optimization techniques to perform allocation, binding, placement of partially reconfigurable modules, and routing the on-chip communication. The presented system-level design methodology complies with the state-of-the-art synthesis tools and communication technologies for partially reconfigurable systems. We demonstrate this by experiments on test cases from the image processing domain applying state-of-the-art technology. The results give evidence of the efficiency of the methodology and show the superiority in terms of runtime and quality of the found solutions compared to existing system-level synthesis approaches.
The demand for energy-efficient, high-performance microcontroller units (MCUs) for the use in power-supply-critical Internet-of-Things (IoT) sensor-node applications has witnessed a substantial increase. In response, ...
详细信息
The demand for energy-efficient, high-performance microcontroller units (MCUs) for the use in power-supply-critical Internet-of-Things (IoT) sensor-node applications has witnessed a substantial increase. In response, research concerning the development of several low-power-consuming MCUs has been actively pursued. The performance level of such MCUs, however, has not been sufficient, thereby rendering them non-feasible for the use in IoT sensor-node applications that process a large number of received signals immediately followed by extraction of valuable information from them to limit data transferred to a data center. To realize next-generation IoT systems based on intelligent sensor-node application, ultra-low-power high-performance MCUs need to be developed. This paper presents an ultra-low-power-consuming and high-performance MCU configuration based on the spintronics device technology, using which all modules are non-volatilized, and any wasteful power consumption is eliminated by controlling the power supplied independently to each module. By incorporating a reconfigurable accelerator module, for performing various signal-processing procedures in sensor-node applications, and a memory controller, which can speed up the entire system by relaxing the data-transfer bottleneck of logic and memory, the proposed MCU configuration achieves ultra-low power consumption and high-speed operation. As confirmed by the results obtained via measurements performed on a fabricated chip, the proposed MCU design, on average, consumed 47.14 $\mu \text{W}$ power at an operating frequency of 200 MHz. This corresponds to the world's highest signal-processing performance and energy efficiency of highly functional IoT sensor nodes powered by harvested energy
Reconfigurable architectures that tightly integrate a standard CPU core with a field-programmable hardware structure have recently been receiving increased attention. The design of such a hybrid reconfigurable process...
详细信息
Reconfigurable architectures that tightly integrate a standard CPU core with a field-programmable hardware structure have recently been receiving increased attention. The design of such a hybrid reconfigurable processor involves a multitude of design decisions regarding the field-programmable structure as well as its system integration with the CPU core. Determining the impact of these design decisions on the overall system performance is a challenging task. In this paper, we first present a framework for the cycle-accurate performance evaluation of hybrid reconfigurable processors on the system level. Then, we discuss a reconfigurable processor for data-streaming applications, which attaches a coarse-grained reconfigurable unit to the coprocessor interface of a standard embedded CPU core. By means of a case study we evaluate the system-level impact of certain design features for the reconfigurable unit, such as multiple contexts, register replication, and hardware context scheduling. The results illustrate that a system-level evaluation framework is of paramount importance for studying the architectural trade-offs and optimizing design parameters for reconfigurable processors. (C) 2004 Elsevier B.V. All rights reserved.
Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koc proposed the Multiple-Word Radix-2 Mon...
详细信息
Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koc proposed the Multiple-Word Radix-2 Montgomery Multiplication (MWR2MM) algorithm and introduced a now-classic architecture for implementing Montgomery multiplication in hardware. With parameters optimized for minimum latency, this architecture performs a single Montgomery multiplication in approximately 2n clock cycles, where n is the size of operands in bits. In this paper, we propose two new hardware architectures that are able to perform the same operation in approximately n clock cycles with almost the same clock period. These two architectures are based on precomputing partial results using two possible assumptions regarding the most significant bit of the previous word. These two architectures outperform the original architecture of Tenca and Koc, in terms of the product latency times area by 23 and 50 percent, respectively, for several most common operand sizes used in cryptography. The architecture in radix-2 can be extended to the case of radix-4, while preserving a factor of two speedup over the corresponding radix-4 design by Tenca, Todorov, and Koc from CHES 2001. Our optimization has been verified by modeling it using Verilog-HDL, implementing it on Xilinx Virtex-II 6000 FPGA, and experimentally testing it using SRC-6 reconfigurable computer.
Real-time simulation of induction machine plays a crucial role in hardware-in-the-loop (HIL) scenarios. Due to the key advantages offered by magnetic equivalent circuits (MEC) for modeling induction machines compared ...
详细信息
Real-time simulation of induction machine plays a crucial role in hardware-in-the-loop (HIL) scenarios. Due to the key advantages offered by magnetic equivalent circuits (MEC) for modeling induction machines compared with finite element analysis and electric equivalent circuits in terms of computational expense and achieved accuracy, this paper proposes a real-time nonlinearMECof the induction machine. Themodel is emulated in real time on the field-programmablegate array (FPGA) by exploiting the parallel hardware architecture and fully pipelined arithmetic processing. The performance of the FPGA-based real-time emulated induction machine model is investigated and compared with the behavior of an experimental setup of induction machine and finite element results to demonstrate the effectiveness and accuracy of proposed approach for HIL applications.
暂无评论