Multi-Processor System on-Chip (MPSoC) architectures are currently designed by using a platform-based approach. In this approach, a wide range of platform parameters must be tuned to find the best trade-offs in terms ...
详细信息
ISBN:
(纸本)9781424419852
Multi-Processor System on-Chip (MPSoC) architectures are currently designed by using a platform-based approach. In this approach, a wide range of platform parameters must be tuned to find the best trade-offs in terms of the selected figures of merit (such as energy, delay and area). This optimization phase is called Design Space Exploration (DSE) and it generally consists of a Multi-Objective Optimization (MOO) problem. The design space for an MPSoC architecture Is too large to be evaluated comprehensively. So far, several heuristic techniques have been proposed to address the MOO problem for MPSoC, but they are characterized by low efficiency to identify the Pareto front. In this paper, an efficient DSE methodology is proposed leveraging traditional Design of Experiments (DoE) and Response Surface modeling (RSM) techniques. In particular, the DoE phase generates an initial plan of experiments used to create a coarse view or the target design space;a set of RSM techniques are then used to reline the exploration. This process is iteratively repeated until the target criterion (e.g. number of simulations) is satisfied, A set of experimental results are reported to trade-off accuracy and efficiency of the proposed techniques with actual workloads(1).
Quantum computing has been attracting increasing attention in recent years because of the rapid advancements that have been made in quantum algorithms and quantum system design. Quantum algorithms are implemented with...
详细信息
ISBN:
(纸本)9781509030767
Quantum computing has been attracting increasing attention in recent years because of the rapid advancements that have been made in quantum algorithms and quantum system design. Quantum algorithms are implemented with the help of quantum circuits. These circuits are inherently reversible in nature and often contain a sizeable Boolean part that needs to be synthesized. The logic design of such quantum circuits constitutes a non-trivial task and, hence, have heavily been investigated by researchers in the recent past. This paper provides a brief overview of these research. We review the major steps to be conducted in the logic design of quantum circuits and provide a sketch for each single step. These descriptions are enriched with discussions as well as references to the respective related work.
We consider software transactional memory (STM) concurrency control in multicore embedded real-time software. We design an Earliest-Deadline-First (EDF) contention manager (CM) to augment STM's obstruction-free pr...
详细信息
Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before defining the final MPSoC architecture details. However, the simulation can only be eff...
详细信息
ISBN:
(纸本)9781479937707
Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before defining the final MPSoC architecture details. However, the simulation can only be efficiently performed when using a modeling and simulation engine that supports the system behavior description in a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop MPSoCBench. This toolset is a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, or 64 cores, cross-compilers, IPs, interconnections, and 17 parallel version of software from well-known benchmarks. This tool also provides power consumption estimation for MIPS and SPARC processors. The MPSoCBench sums 864 different configurations automated through scripts.
We present a novel approach that assists the task of porting code to an embedded platform. Our tool automatically identifies code segments in the input program that can be replaced with optimized kernels from a platfo...
详细信息
Major challenges for system-level Design Space Exploration (DSE) include (a) tremendous search-space sizes for modern many-core architectures and networked systems and (b) the preponderance of infeasible solutions in ...
详细信息
ISBN:
(纸本)9783031045806;9783031045790
Major challenges for system-level Design Space Exploration (DSE) include (a) tremendous search-space sizes for modern many-core architectures and networked systems and (b) the preponderance of infeasible solutions in the search space from which no actual implementations can be derived. Since current DSE approaches are not equipped to handle these developments, we propose the integration of deep generative models into DSE to automatically compress large-scale search spaces, thus (I) reducing problem complexity faced by the optimizer while (II) learning a model of feasible solutions to focus the optimization on. The proposed approach is seamlessly integrated into state-of-the-art DSE flows, is complementary to existing search-space pruning techniques, and its potential to improve optimization quality by up to approximate to 66% is demonstrated for a variety of DSE problems.
Data-Driven Multithreading (DDM) is a threaded data-flow model that schedules threads for execution based on data availability. DDM is utilizing a Thread Scheduling Unit (TSU) for the management of the threads on sequ...
详细信息
ISBN:
(纸本)9781479901036
Data-Driven Multithreading (DDM) is a threaded data-flow model that schedules threads for execution based on data availability. DDM is utilizing a Thread Scheduling Unit (TSU) for the management of the threads on sequential processors. In this work we present the hardware implementation of the TSU with synthesizable code using the Verilog HDL and its evaluation using the ISim simulator. The evaluation results show that the TSU is able to run at a maximum frequency of 180 MHz and consumes only 5% of the Xilinx Virtex-6 FPGA resources. The initial results obtained in this work will enable us to design an FPGA based DDM multicore chip consisting of several Microblaze cores driven by the TSU. Thus, we will be able to evaluate the performance of the novel threaded data-flow model and have direct comparison with the sequential model on the same hardware.
With the advent of diverse enabling technologies, brain-related research has, in recent years, been seriously amplified and has already started yielding impressive findings across various fronts. With respect to compu...
详细信息
This paper presents a modular coprocessor architecture for embedded real-time image and video signal processing. Applications are separated into high-level and low-level algorithms and mapped onto a RISC and a coproce...
详细信息
ISBN:
(纸本)9783540736226
This paper presents a modular coprocessor architecture for embedded real-time image and video signal processing. Applications are separated into high-level and low-level algorithms and mapped onto a RISC and a coprocessor, respectively. The coprocessor comprises an optimized system bus, different application specific processing elements and I/O interfaces. For low volume production or prototyping, the architecture can be mapped onto FPGAs, which allows flexible extension or adaption of the architecture. Depending on the complexity of the coprocessor data paths, frequencies up to 150 MHz have been achieved on a Virtex II-Pro FPGA. Compared to a RISC processor, the performance gain for an SSD algorithm is more than factor 70.
Effective memory utilization is critical to reap the benefits of the multi-core processors emerging on embeddedsystems. In this paper we explore the use of a stream model to effectively utilize memory hierarchies. We...
详细信息
ISBN:
(纸本)9783540736226
Effective memory utilization is critical to reap the benefits of the multi-core processors emerging on embeddedsystems. In this paper we explore the use of a stream model to effectively utilize memory hierarchies. We target image processing algorithms running on the Analog Devices Blackfin BF561 fixed-point, dual-core DSP. Using optimized assembly to effectively use cores reduces runtime, but also underscores the need to mitigate the memory bottleneck. Like other embedded processors, the Blackfin BF561 has L2 SRAM available. Applying the stream model allows us to effectively make full use of both cores and the L2 SRAM. We achieve almost a 10X speedup in execution time compared to non-optimized C code.
暂无评论