When confronted with a control problem for a complicated physical process, a control engineer generally follows a relatively systematic design procedure. In this paper we design an on-line controller which is able to ...
详细信息
In this research is applied simulation modeling of queuing systems with the help of vector quantization. A technique for adjusting the boundaries between classes with vector quantization is suggested. With this techni...
详细信息
ISBN:
(纸本)0780379632
In this research is applied simulation modeling of queuing systems with the help of vector quantization. A technique for adjusting the boundaries between classes with vector quantization is suggested. With this technique is determined the function of density distribution of the input flow. The implementation of suggested approach for simple Jackson queuing system is shown.
Field tests demonstrated the value of enhancements that enable an advanced nonintrusive load monitoring system to tackle complex monitoring environments. Nonintrusive load monitoring (NILM) can determine the operating...
详细信息
Field tests demonstrated the value of enhancements that enable an advanced nonintrusive load monitoring system to tackle complex monitoring environments. Nonintrusive load monitoring (NILM) can determine the operating schedule of electrical loads in a target system from measurements made at a centralized location, such as the electric utility service entry. In contrast to other systems, NILM reduces sensor costs by using relatively few sensors.
Co-evolution is a posible solution to the problem of simultaneous optimization of artificial neural network and training agorithm parameters, due to its ability to deal with vast search spaces. Moreover, this scheme i...
详细信息
The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and l...
详细信息
The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. We propose to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Y/sub PAV/), which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Y/sub PAV/ of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Y/sub PAV/ to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.
This paper describes several methods for improving the scalability of memory disambiguation hardware for future high ILP processors. As the number of in-flight instructions grows with issue width and pipeline depth, t...
详细信息
ISBN:
(纸本)9780769520438
This paper describes several methods for improving the scalability of memory disambiguation hardware for future high ILP processors. As the number of in-flight instructions grows with issue width and pipeline depth, the load/store queues (LSQ) threaten to become a bottleneck in both power and latency. By employing lightweight approximate hashing in hardware with structures called Bloom filters, many improvements to the LSQ are possible. We propose two types of filtering schemes using Bloom filters: search filtering, which uses hashing to reduce both the number of lookups to the LSQ and the number of entries that must be searched, and state filtering, in which the number of entries kept in the LSQs is reduced by coupling address predictors and Bloom filters, permitting smaller queues. We evaluate these techniques for LSQs indexed by both instruction age and the instruction's effective address, and for both centralized and physically partitioned LSQs. We show that search filtering avoids up to 98% of the associative LSQ searches, providing significant power savings and keeping LSQ searches to under one high-frequency clock cycle. We also show that with state filtering, the load queue can be eliminated altogether with only minor reductions n performance for small instruction window machines.
The dynamic nature of large-size Network Computing Systems (NCSs) and the varying monitoring demands from the end-users pose serious challenges for monitoring systems (MSs). A statically configured MS initially adjust...
详细信息
Data-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at specific classes of these programs. This paper presents a classification scheme for da...
详细信息
ISBN:
(纸本)9780769520438
Data-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at specific classes of these programs. This paper presents a classification scheme for data-parallel program attributes, and proposes micro-architectural mechanisms to support applications with diverse behavior for using a single reconfigurable architecture. We focus on the following four broad kinds of data-parallel programs - DSP/multimedia, scientific, networking, and real-time graphics workloads. While all of these programs exhibit high computational intensity, coarse-grain regular control behavior, and some regular memory access behavior, they show wide variance in the computation requirements, fine grain control behavior, and the frequency of other types of memory accesses. Based on this study of application attributes, this paper proposes a set of general micro-architectural mechanisms that enable a baseline architecture to be dynamically tailored to the demands of a particular application. These mechanisms provide efficient execution across a spectrum of data-parallel application and can be applied to diverse architectures ranging from vector cores to conventional superscalar cores. Our results using a baseline TRIPS processor show that the configurability of the architecture to the application demands provides harmonic mean performance improvement of 5%-55% over scalable yet less flexible architectures, and performs competitively against specialized architectures.
We describe the polymorphous TRIPS architecture, which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system t...
详细信息
We describe the polymorphous TRIPS architecture, which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture contains four out-of-order, 16-wide-issue grid processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes-ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.
The increase in high-performance microprocessor power consumption is due in part to the large power overhead of wide-issue, highly speculative cores. Microarchitectural speculation, such as branch prediction, increase...
详细信息
ISBN:
(纸本)9781581136821
The increase in high-performance microprocessor power consumption is due in part to the large power overhead of wide-issue, highly speculative cores. Microarchitectural speculation, such as branch prediction, increases instruction throughput but carries a power burden due to wasted power for mis-speculated instructions. Pipeline over-provisioning supplies excess resources which often go unused. In this paper, we use our detailed performance and power model for an Alpha 21264 to measure both the useful energy and the wasted effort due to mis-speculation and over-provisioning. Our experiments show that flushed instructions account for approximately 6% of total energy, while over-provisioning imposes a tax of 17% on average. These results suggest opportunities for power savings and energy efficiency throughout microprocessor pipelines.
暂无评论