We introduce a new approach to take into account the memory architecture and the memory mapping in behavioral synthesis. We formalize the memory mapping as a set of constraints for the synthesis, and defined a Memory ...
详细信息
ISBN:
(纸本)1581139373
We introduce a new approach to take into account the memory architecture and the memory mapping in behavioral synthesis. We formalize the memory mapping as a set of constraints for the synthesis, and defined a Memory Constraint Graph and an accessibility criterion to be used in the scheduling step. We present a new strategy for implementing signals (ageing vectors). We formalize the maturing process and explain how it may generate memory conflicts over several iterations of the algorithm. The final Compatibility Graph indicates the set of valid mappings for every signal. Several experiments are performed with our HLS tool GAUT. Our scheduling algorithm exhibits a relatively low complexity that permits to tackle complex designs in a reasonable time.
In ten years the cellular telephone has evolved from a tool for the professional to an indispensable consumer product with a very high market penetration. At the same time, the handset cost, weight, and standby time h...
详细信息
ISBN:
(纸本)1581139373
In ten years the cellular telephone has evolved from a tool for the professional to an indispensable consumer product with a very high market penetration. At the same time, the handset cost, weight, and standby time have been reduced by more than a factor of ten. These factors have been critical for the success story of the mobile phone. The technical aspects behind the rapid handset evolution are discussed. In particular, what advances in the radio architecture, for example the zero-IF GSM receiver, the baseband (CMOS) technology, and the radio system design areas have meant for the reduction of size, weight, cost, and power consumption is discussed. Future challenges, like SW-DSP-digital-RF partitioning, linear multi-mode modulation with high linearity requirements, digital leakage issues, and power consumption limitations in multimedia handsets are discussed with future generation handsets in mind.
This paper presents AMVP - a systemC based simulator for ARM multicore platforms designed as a tool for early SW development during platform bring-up. It uses a parallel systemC simulation approach to counter the perf...
详细信息
ISBN:
(纸本)9781538655627
This paper presents AMVP - a systemC based simulator for ARM multicore platforms designed as a tool for early SW development during platform bring-up. It uses a parallel systemC simulation approach to counter the performance slowdown incurred when simulating multi-processor designs. AMVP is sufficiently detailed to support running and debugging unmodified SW images of popular benchmark applications and modern OS kernels like Linux.
The design of specialized accelerators is essential to the success of many modern systems-on-Chip. Electronic system-level design methodologies and high-level synthesis tools are critical for the efficient design and ...
详细信息
ISBN:
(纸本)9781450330510
The design of specialized accelerators is essential to the success of many modern systems-on-Chip. Electronic system-level design methodologies and high-level synthesis tools are critical for the efficient design and optimization of an accelerator. Still, these methodologies and tools offer only limited support for the optimization of the memory structures, which are often responsible for most of the area occupied by an accelerator. To address these limitations, we present a novel methodology to automatically derive the memory subsystems of SoC accelerators. Our approach enables compositional design-space exploration and promotes design reuse of the accelerator specifications. We illustrate its effective-ness by presenting experimental results on the design of two accelerators for a high-performance embedded application. Copyright 2014 ACM.
This paper presents a secure hardware architecture of an image sensor to accelerate feature extraction using region-level parallelism. For each logical region, the design includes a region processing unit (RPU) with a...
详细信息
ISBN:
(纸本)9781728191980
This paper presents a secure hardware architecture of an image sensor to accelerate feature extraction using region-level parallelism. For each logical region, the design includes a region processing unit (RPU) with an attention module (AM). The AM activates the processing in the RPU if there are no spatiotemporal redundancies. It reduces power consumption and data volume by utilizing the concepts of predictive coding. Also, every RPU has a crypto-core driven by the AM to withstand against adversaries. Simulation results show we can save 89.70% power with a significant speedup.
Process variability is an emerging problem that is becoming worse with each new technology node. Its impact on the performance and energy of memory organizations is severe and degrades the system-level parametric yiel...
详细信息
ISBN:
(纸本)1595931619
Process variability is an emerging problem that is becoming worse with each new technology node. Its impact on the performance and energy of memory organizations is severe and degrades the system-level parametric yield. In this paper we propose a broadly applicable system-level technique that can guarantee parametric yield on the memory organization and which minimizes the energy overhead associated to variability in the conventional design process. It is based on offering configuration capabilities at the memory-level and exploiting them at the system-level. This technique can decrease by up to a factor of 5 the energy overhead that is introduced by state-of-the-art process variability compensation techniques, including statistical timing analysis. In this way we obtain results close to the ideal nominal design again.
In this paper, we present our cache configuration prediction methodology offloaded to an FPGA for improved performance and hardware overhead reduction, while maintaining cache configuration predictions within 5% of th...
详细信息
ISBN:
(纸本)9781450369237
In this paper, we present our cache configuration prediction methodology offloaded to an FPGA for improved performance and hardware overhead reduction, while maintaining cache configuration predictions within 5% of the optimal energy cache configuration for application phases for the instruction and data caches.
With ever-increasing power densities, temperature management using software and hardware techniques has become a necessity in the design of modern electronic systems. Such techniques have to be validated and optimized...
详细信息
ISBN:
(纸本)9781450307154
With ever-increasing power densities, temperature management using software and hardware techniques has become a necessity in the design of modern electronic systems. Such techniques have to be validated and optimized with respect to the thermal guarantee they provide, i.e., a safe upperbound on the peak temperature of the system under all operating conditions. The computation of such a guarantee depends on the power and timing characteristics of the system. In this paper, we present formalisms to capture such characteristics at the system-level and provide an analytical technique to compute a provably safe upper-bound on the peak temperature. The proposed characterization and analysis is general in that it considers an impulse-responsebased thermal model, task-dependent power consumption, tasks with dynamic arrival patterns and variable resource demand, and a scheduling policy expressed as a hierarchical composition of several commonly used policies. Copyright 2011 ACM.
Conventional High Level synthesis (HLS) tools mainly target compute intensive kernels typical of digital signal processing applications. We are developing techniques and architectural templates to enable HLS of data a...
详细信息
ISBN:
(纸本)9781450344838
Conventional High Level synthesis (HLS) tools mainly target compute intensive kernels typical of digital signal processing applications. We are developing techniques and architectural templates to enable HLS of data analytics applications. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dynamic task parallelism. We discuss an architectural template based around a distributed controller to efficiently exploit thread level parallelism. We present a memory interface that supports parallel memory subsystems and enables implementing atomic memory operations. We introduce a dynamic task scheduling approach to efficiently execute heavily unbalanced workload. The templates are validated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.
Erroneous systems allow timing errors to occur during execution, but use measures to ensure continued operation through changes in operating parameters (voltage and frequency), error correction at various levels of th...
详细信息
ISBN:
(纸本)9781450330510
Erroneous systems allow timing errors to occur during execution, but use measures to ensure continued operation through changes in operating parameters (voltage and frequency), error correction at various levels of the system, or ensuring controlled occurrence of errors to perform approximate computing. In this paper, we are interested in characterization of error behavior at the level of instructions and programs. We propose Inter- and Intra-Program Variation as measures of error rate variability in different programs and among instructions of a program, respectively. We also characterize the error rate variation caused by the program input data and show that it is comparable to other sources of variability such as process variation. Finally, we present an analysis of the physical location of errors in hardware, identify regions in which most of the errors occur, and how different programs change the distribution of errors among these regions. In order to enable reliable timing analysis of large programs, we propose Clustered Timing Model (CTM), a high level timing model based on clustering functionally similar timing paths of the processor, and develop a CTM for LEON3, a representative in-order RISC processor. The accuracy of the model is verified using our variation-aware timing analysis framework with an average error of 3.9% (max. 6.7%) across a wide range of voltage-temperature corners.
暂无评论