the increasing performance needs in critical real-time embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. How...
详细信息
ISBN:
(纸本)9781538677698
the increasing performance needs in critical real-time embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. However, their time-predictability features are quite unexplored. the ARM *** architecture is a good candidate for adoption in the CRTES market (i.e. in the automotive market it has already started being used). In this paper we study ARM ***'s capabilities to meet CRTES requirements. In particular, we perform a qualitative and quantitative assessment of its timing characteristics, focusing on shared multicore resources, and how this architecture can be reliably used in CRTES.
the results of a study for the design of a 250-MHz GaAs microprocessor that uses multichip module (MCM) technology to improve performance are presented. the design study for the resulting two-level split cache starts ...
详细信息
ISBN:
(纸本)0897913949
the results of a study for the design of a 250-MHz GaAs microprocessor that uses multichip module (MCM) technology to improve performance are presented. the design study for the resulting two-level split cache starts with a baseline cache architecture and then examines primary cache size and degree of associativity;primary data-cache write policy;secondary cache size and organization;primary cache fetch size;and concurrency between instruction and data accesses. A trace-driven simulator is used to analyze each design's performance. Memory access time and page-size constraints effectively limit the size of the primary data and instruction caches to 4 kW (16 kB). For such cache sizes, a write-through policy is better than a write-back policy. three cache mechanisms contribute to improved performance. the first is a variant of the write-through policy called write-only. this write policy provides most of the performance benefits of subblock placement without extra valid bits. the second is the use of a split secondary cache. the third mechanism allows loads to pass stores without associative matching.
the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualiz...
详细信息
ISBN:
(纸本)0769522750
the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualization and modularization, and an increased commoditization of hardware. At the application level, boththe workload mix and usage patterns have evolved to an increased emphasis on service-centric computing and SLA-driven performance tuning. these, often dramatic, changes in the enterprise IT landscape motivate equivalent changes in the emphasis of architecture research. In this paper, we summarize some recent trends in enterprise IT systems and discuss the implications for architecture research, suggesting some high-level challenges and open questions for the community to address.
Future highperformancecomputing will undoubtedly reach Petascale and beyond. Today's HPC is tomorrow's Personal computing. What are the evolving processor architectures towards Multi-core and Many-core for t...
详细信息
Cache memories are widely used in microprocessors to improve the average -case memory performance. However, they are harmful to time predictability, and thus may not be desirable for real-time systems. In this paper, ...
详细信息
ISBN:
(纸本)9781479987818
Cache memories are widely used in microprocessors to improve the average -case memory performance. However, they are harmful to time predictability, and thus may not be desirable for real-time systems. In this paper, we make simple hardware extensions of a regular cache to implement the performance enhancement guaranteed cache (PEG -C). the PEG -C is totally controlled by hardware, which can automatically improve the average -case performance of real-time software with guaranteed and enhanced worst -case performance.
this paper introduces a computerarchitecture, where part of the instruction set architecture (ISA) is implemented on small highly-integrated field-programmable gate arrays (FPGAs). Small FPGAs inside a general-purpos...
详细信息
ISBN:
(数字)9783031199837
ISBN:
(纸本)9783031199820;9783031199837
this paper introduces a computerarchitecture, where part of the instruction set architecture (ISA) is implemented on small highly-integrated field-programmable gate arrays (FPGAs). Small FPGAs inside a general-purpose processor (CPU) can be used effectively to implement custom or standardised instructions. Our proposed architecture directly address related challenges for high-end CPUs, where such highly-integrated FPGAs would have the highest impact, such as on main memory bandwidth. this also enables software-transparent context-switching. the simulation-based evaluation of a dynamically reconfigurable core shows promising results approaching the performance of an equivalent core with all enabled instructions. Finally, the feasibility of adopting the proposed architecture in today's CPUs is studied through the prototyping of fast-reconfigurable FPGAs and profiling the miss behaviour of opcodes.
We present a case study of performance measurement and modeling of a CCA (Common Component architecture) component-based application in a highperformancecomputing environment. Component-based HPC applications allow ...
详细信息
ISBN:
(纸本)0769521320
We present a case study of performance measurement and modeling of a CCA (Common Component architecture) component-based application in a highperformancecomputing environment. Component-based HPC applications allow the possibility of creating component-level performance models and synthesizing them into application performance models. However, they impose the restriction that performance measurement/monitoring needs to be done in a non-intrusive manner and at a fairly coarse-grained level. We propose a performance measurement infrastructure for HPC based loosely on recent work done for Grid environments. A prototypical implementation of the infrastructure is used to collect data for three components in a scientific application and construct their performance models. Both computational and message-passing performance are addressed.
Trustable Worst-Case Execution-Time (WCET) bounds are a necessary component for the construction and verification of hard real-time computer systems. Deriving such bounds for contemporary hardware/software systems is ...
详细信息
ISBN:
(纸本)9781467377096
Trustable Worst-Case Execution-Time (WCET) bounds are a necessary component for the construction and verification of hard real-time computer systems. Deriving such bounds for contemporary hardware/software systems is a complex task. the single-path conversion overcomes this difficulty by transforming all unpredictable branch alternatives in the code to a sequential code structure with a single execution trace. However, the simpler code structure and analysis of single-path code comes at the cost of a longer execution time. In this paper we address the problem of the execution performance of single-path code. We present a new cache organization that utilizes the principle of locality of single-path code to reduce cache miss latency and cache miss rate. the proposed cache memory architecture combines cache prefetching and cache locking, so that the prefetcher capitalizes on spatial locality while the locker makes use of temporal locality. the demonstration section shows how these two techniques can complement each other.
We make two observations about communications middleware: first, most middleware are similar, the differences are in their interfaces and optimizations;second, neither a fixed set of abstractions nor a fixed implement...
详细信息
ISBN:
(纸本)0780350049
We make two observations about communications middleware: first, most middleware are similar, the differences are in their interfaces and optimizations;second, neither a fixed set of abstractions nor a fixed implementation of a set of abstractions is likely to be sufficient and well-performing all applications. Based on these observations, we present Quarterware, a customizable middleware architecture. It abstracts basic middleware functionality, and admits application specific specializations and extensions. We demonstrate its flexibility by deriving implementations for core facilities of CORBA, RMI, and MPI. ther performance results show that the derived implementations equal or exceed the performance of corresponding native versions. these results suggest that customizing middleware on a per-application basis is an effective approach for building robust, high-performance applications.
the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of ...
详细信息
ISBN:
(纸本)9781538677698
the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary filesystems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates withthe job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.
暂无评论