The parallelization of Simulink applications is currently a responsibility of the system designer and the superscalar execution of the processors. State-of-the-art Simulink compilers excel at producing reliable and pr...
详细信息
ISBN:
(纸本)9781605586359
The parallelization of Simulink applications is currently a responsibility of the system designer and the superscalar execution of the processors. State-of-the-art Simulink compilers excel at producing reliable and production-quality embedded code, but fail to exploit the natural concurrency available in the programs and to effectively use modern multi-core architectures. The reason may be that many Simulink applications are replete with loop-carried dependencies that inhibit most parallel computing techniques and compiler transformations. In this paper, we introduce the concept of strands that allow the data dependencies to be broken while preserving the original semantics of the Simulink program. Our fully automatic compiler transformations create a concurrent representation of the program, and thread-level parallelism for multi-coresystems is planned and orchestrated. To improve single processor performance, we also exploit fine grain (equation-level) parallelism by level-order scheduling inside each thread. Our strand transformation has been implemented as an automatic transformation in a proprietary compiler and with a realistic aeronautic model executed in two processors leads to an up to 1.98 times speedup over uniprocessor execution, while the existing manual parallelization method achieves a 1.75 times speedup.
The isolation capabilities provided by conventional enterprise data center technology are inadequate for many clients of multi-tenant storage or compute clouds. To address this deficiency we propose a cloud architectu...
详细信息
Manufacture-time process variation and life-time failure projections have become a major industry concern. Consequently, fault tolerance, historically of interest only for mission-critical systems, is now gaining atte...
详细信息
ISBN:
(纸本)9781424475018
Manufacture-time process variation and life-time failure projections have become a major industry concern. Consequently, fault tolerance, historically of interest only for mission-critical systems, is now gaining attention in the mainstream computing space. Traditionally reliability issues have been addressed at a coarse granularity, e.g., by disabling faulty cores in chip multiprocessors. However, this is not scalable to higher failure rates. In this paper, we propose StageWeb, a fine-grained wearout and variation tolerance solution, that employs a reconfigurable web of replicated processor pipeline stages to construct dependable many-core chips. The interconnection flexibility of Stage Web simultaneously tackles wearout failures (by isolating broken stages) and process variation (by selectively disabling slower stages). Our experiments show that through its wearout tolerance, a StageWeb chip performs up to 70% more cumulative work than a comparable chip multiprocessor. Further, variation mitigation in StageWeb enables it to scale supply voltage more aggressively, resulting in up to 16% energy savings.
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problem...
详细信息
High-Performance Reconfigurable Computers (HPRCs) are parallel machines consisting of FPGAs and microprocessors, with the FPGAs used as co-processors. The execution of parallel applications on such systems has mainly ...
详细信息
ISBN:
(纸本)9783642121326
High-Performance Reconfigurable Computers (HPRCs) are parallel machines consisting of FPGAs and microprocessors, with the FPGAs used as co-processors. The execution of parallel applications on such systems has mainly followed the Single-Program multiple-Data (SPMD) model;however, overall system resources are often underutilized because of the asymmetric distribution of the reconfigurable (co-)processors relative to the (main) processors. Furthermore, with the introduction of HPRCs containing multi/many-core technologies, underutilization of system resources becomes more obvious especially for multi-tasking and multi-user usage. To address the asymmetry problem, we propose a resource virtualization solution based on Partial Run-Time Reconfiguration (PRTR). The proposed technique allows space, time, and/or space-time sharing of the reconfigurable (co-)processors among the (main) processors and thus increasing the overall system utilization. We show the effectiveness of the proposed concepts through a stochastic execution model verified with experimental implementations on the Cray XD1 platform. The results demonstrate favorable performance as well as scalability characteristics.
GPUs (Graphics Processing Units) have become one of the main co-processors that contributed to desktops towards high performance computing. Together with multicore CPUs and other co-processors, a powerful heterogeneou...
详细信息
The scattering of acoustic waves in non-homogeneous media has been of practical interest for the petroleum industry, mainly in the determination of new oil deposits. A family of computational models that represent thi...
详细信息
Driven by the market demand for high-definition 3D graphics, commodity graphics processing units (GPUs) have evolved into highly parallel, multi-threaded, many-core processors, which are ideal for data parallel comput...
详细信息
With the emergence of many-core processors, accelerators, and alternative/heterogeneous architectures, the HPC community faces a new challenge: a scaling in number of processing elements that supersedes the historical...
详细信息
ISBN:
(纸本)9781424477302
With the emergence of many-core processors, accelerators, and alternative/heterogeneous architectures, the HPC community faces a new challenge: a scaling in number of processing elements that supersedes the historical trend of scaling in processor frequencies. The attendant increase in system complexity has first-order implications for fault tolerance. Mounting evidence invalidates traditional assumptions of HPC fault tolerance: faults are increasingly multiple-point instead of single-point and interdependent instead of independent;silent failures and silent data corruption are no longer rare enough to discount;stabilization time consumes a larger fraction of useful system lifetime, with failure rates projected to exceed one per hour on the largest systems;and application interrupt rates are apparently diverging from system failure rates.
many advanced hardware accelerations for virtualization, such as Pause Loop Exit (PLE), Extended Page Table (EPT), and Single Root I/O Virtualization (SR-IOV), have been introduced recently to improve the virtualizati...
详细信息
暂无评论