This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. ...
详细信息
ISBN:
(纸本)9781467322973;9781467322966
This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. The latter parameter is normally not known at design time. This leads to the development of more general purpose processors which are capable to handle the data to be processed in any case. With the novel approach which keeps the microarchitecture of a processor flexible, the processor can start as a general purpose device and end up with a specific parameterization, comparable with application specific processor architectures. Furthermore, the increased degree of freedom which is enabled through the approach for a novel quality of processors is described.
Energetic-particle induced soft errors in on-chip cache memories have become a major challenge in designing new generation reliable microprocessors. Uniformly applying conventional protection schemes such as error cor...
详细信息
ISBN:
(纸本)1424401550
Energetic-particle induced soft errors in on-chip cache memories have become a major challenge in designing new generation reliable microprocessors. Uniformly applying conventional protection schemes such as error correcting codes (ECC) to SRAM caches may not be practical where performance, power, and die area are highly constrained, especially for embeddedsystems. In this paper, we propose to analyze the lifetime behavior of the data cache to identify its temporal vulnerability. For this vulnerability analysis, we develop a new lifetime model. Based on the new lifetime model, we evaluate the effectiveness of several existing schemes in reducing the vulnerability of the data cache. Furthermore, we propose to periodically invalidate clean cache lines to reduce the probability of errors being read in by the CPU. Combined with previously proposed early writeback strategies [1], our schemes achieve a substantially low vulnerability in the data cache, which indicate the necessity of different protection schemes for data items during various phases in their lifetime.
Software synthesis from an initial specification model becomes a critical issue in the ESL design methodology as hardware platforms are often reused and more processors are involved in the target platform. Since embed...
详细信息
Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the embeddedsystems and in the high-performance computing arenas, in the form of specialized or general-purpose accelerators. Pro...
详细信息
ISBN:
(纸本)9781479901036
Multiprocessor systems-on-chip (MPSoC) are now considered first-class citizens both in the embeddedsystems and in the high-performance computing arenas, in the form of specialized or general-purpose accelerators. Programming models for such systems is currently a hot research topic, and as a general rule require deep programmer knowledge of the underlying hardware architecture. In this paper we present the implementation of OpenMP, one of the most intuitive and productive programming models, on the STHORM accelerator. This particular platform provides a shared-memory substrate which OpenMP requires. An innovative feature of our design is the deployment of the OpenMP model both at the host and the fabric sides, in a seamless way, which provides the programmer with a simple but effective interface for offloading and executing OpenMP kernels on the MPSoC. The optimized runtime environment provides full OpenMP support despite its small footprint (less than 10KB for a 16-core cluster) and can sustain close-to-ideal speedups in computationally intensive applications. We detail on design issues we faced along with their solutions, given the limited available resources.
Major challenges for system-level Design Space Exploration (DSE) include (a) tremendous search-space sizes for modern many-core architectures and networked systems and (b) the preponderance of infeasible solutions in ...
详细信息
ISBN:
(纸本)9783031045806;9783031045790
Major challenges for system-level Design Space Exploration (DSE) include (a) tremendous search-space sizes for modern many-core architectures and networked systems and (b) the preponderance of infeasible solutions in the search space from which no actual implementations can be derived. Since current DSE approaches are not equipped to handle these developments, we propose the integration of deep generative models into DSE to automatically compress large-scale search spaces, thus (I) reducing problem complexity faced by the optimizer while (II) learning a model of feasible solutions to focus the optimization on. The proposed approach is seamlessly integrated into state-of-the-art DSE flows, is complementary to existing search-space pruning techniques, and its potential to improve optimization quality by up to approximate to 66% is demonstrated for a variety of DSE problems.
Data-Driven Multithreading (DDM) is a threaded data-flow model that schedules threads for execution based on data availability. DDM is utilizing a Thread Scheduling Unit (TSU) for the management of the threads on sequ...
详细信息
ISBN:
(纸本)9781479901036
Data-Driven Multithreading (DDM) is a threaded data-flow model that schedules threads for execution based on data availability. DDM is utilizing a Thread Scheduling Unit (TSU) for the management of the threads on sequential processors. In this work we present the hardware implementation of the TSU with synthesizable code using the Verilog HDL and its evaluation using the ISim simulator. The evaluation results show that the TSU is able to run at a maximum frequency of 180 MHz and consumes only 5% of the Xilinx Virtex-6 FPGA resources. The initial results obtained in this work will enable us to design an FPGA based DDM multicore chip consisting of several Microblaze cores driven by the TSU. Thus, we will be able to evaluate the performance of the novel threaded data-flow model and have direct comparison with the sequential model on the same hardware.
We consider software transactional memory (STM) concurrency control in multicore embedded real-time software. We design an Earliest-Deadline-First (EDF) contention manager (CM) to augment STM's obstruction-free pr...
详细信息
This paper presents a modular coprocessor architecture for embedded real-time image and video signal processing. Applications are separated into high-level and low-level algorithms and mapped onto a RISC and a coproce...
详细信息
ISBN:
(纸本)9783540736226
This paper presents a modular coprocessor architecture for embedded real-time image and video signal processing. Applications are separated into high-level and low-level algorithms and mapped onto a RISC and a coprocessor, respectively. The coprocessor comprises an optimized system bus, different application specific processing elements and I/O interfaces. For low volume production or prototyping, the architecture can be mapped onto FPGAs, which allows flexible extension or adaption of the architecture. Depending on the complexity of the coprocessor data paths, frequencies up to 150 MHz have been achieved on a Virtex II-Pro FPGA. Compared to a RISC processor, the performance gain for an SSD algorithm is more than factor 70.
With the advent of diverse enabling technologies, brain-related research has, in recent years, been seriously amplified and has already started yielding impressive findings across various fronts. With respect to compu...
详细信息
In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models w...
详细信息
ISBN:
(纸本)9781467322973
In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models which proclaim ease-of-use, portability, and high performance. A novel model in this class, OpenMP Superscalar, combines advanced features such as automated runtime dependency resolution, while maintaining simple pragma-based programming for C/C++. OpenMP Superscalar has proven to be effective in leveraging parallelism in HPC workloads. embedded and consumer applications, however, are currently still mainly parallelized using traditional thread-based programming models. In this work, we investigate how effective OpenMP Superscalar is for embedded and consumer applications in terms of usability and performance. To determine the usability of OmpSs, we show in detail how to implement complex parallelization strategies such as ones used in parallel H. 264 decoding. To evaluate the performance we created a collection of ten embedded and consumer benchmarks parallelized in both OmpSs and Pthreads.
暂无评论