This paper compares some m utationoperators containing expert knowledge about the problem of optimizing the parameters of a Radial Basis Function Neural Network. It is sho wn that the expert kno wledge is not abrays a...
详细信息
Differentiated Services (DiffServ), which are currently being standardized in the IETF DiffServ working group, is a solution that can provide different qualities of service to different network users. DiffServ aggrega...
详细信息
We study the space of chip multiprocessor (CMP) organizations. We compare the area and performance trade-offs for CMP implementations to determine how many processing cores future server CMPs should have, whether the ...
详细信息
We study the space of chip multiprocessor (CMP) organizations. We compare the area and performance trade-offs for CMP implementations to determine how many processing cores future server CMPs should have, whether the cores should have in-order or out-of-order issues, and how big the per-processor on-chip caches should be. We find that, contrary to some conventional wisdom, out-of-order processing cores will maximize job throughput on future CMPs. As technology shrinks, limited off-chip bandwidth will begin to curtail the number of cores that can be effective on a single die. Current projections show that the transistor/signal pin ratio will increase by a factor of 45 between 180 and 35 nanometer technologies. That disparity will force increases in per-processor cache capacities as technology shrinks, from 128KB at 100nm, to 256KB at 70nm, and to 1MB at 50 and 35nm, reducing the number of cores that would otherwise be possible.
In this paper we survey the design space of a new class of architectures called Grid Processor architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventi...
详细信息
In this paper we survey the design space of a new class of architectures called Grid Processor architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventional architectures while providing superior instruction-level parallelism on traditional workloads and high performance across a range of application classes. A GPA consists of an array of ALUs, each with limited control, connected by a thin operand network. Programs are executed by mapping blocks of statically scheduled instructions to the ALU array and executing them dynamically in dataflow order. This organization enables the critical paths of instruction blocks to be executed on chains of ALUs without transmitting temporary values back to the register file, avoiding most of the large, unscalable structures that limit the scability of conventional architectures. Finally, we present simulation results of a preliminary design, the GPA-1. With a half-cycle routing delay, we obtain performance roughly equal to an ideal 8-way, 512-entry window superscalar core. With no inter-ALU delay, perfect memory, and perfect branch prediction, the IPC of the GPA-1 is more than twice that of the ideal superscalar core, achieving an average of 111PC across nine SPEC CPU2000 and Mediabench benchmarks.
Microprocessor performance has been improved by increasing the capacity of on-chip caches. However, the performance gain comes at the price of increased static energy consumption due to sub-threshold leakage current. ...
详细信息
Microprocessor performance has been improved by increasing the capacity of on-chip caches. However, the performance gain comes at the price of increased static energy consumption due to sub-threshold leakage current. This paper compares three techniques for reducing static energy consumption in on-chip level-1 and level-2 caches. One technique employs low-leakage transistors in the memory cell. Another technique, power supply switching can be used to turn off the memory cells and discard their contents. A third alternative is dynamic threshold modulation, which places the memory cells in a standby state that preserves cell contents. In our experiments, we explore the energy/performance trade-offs of these techniques and find that the dynamic threshold modulation achieves the best results for level-1 caches, improving the energy-delay product by 2% in a level-1 instruction cache and 7% in a level-1 data cache. Low-leakage transistors perform best for the level-2 cache as they reduce the static energy by up to 98% and improve the energy-delay product by more than a factor of 50.
In this paper we describe and characterize the speech recognition process, and assess the suitability of current microprocessors and memory systems for running speech recognition applications. We use representative be...
详细信息
ISBN:
(纸本)0780373154
In this paper we describe and characterize the speech recognition process, and assess the suitability of current microprocessors and memory systems for running speech recognition applications. We use representative benchmark applications-RASTA to characterize the signal-processing on the front end, and SPHINX for the graph search on the back end Recognition time is dominated by the back end, which substantially exercises the memory system and exhibits low levels of instruction-level parallelism (ILP). As a result, SPHINX yields an average instructions per cycle (IPC) of 0.64 on a simulated 4-issue out-of-order microprocessor We identify intelligent layout and thread-level parallelization as the primary methods to improve throughput, showing tipper bounds on the performance improvements that these methods can achieve.
An enhanced self-organizing controller (E-SOC) is presented to achieve real time global learning in fuzzy controllers. Direct control is achieved by means of two auxiliary systems: the first one is responsible for ada...
详细信息
An enhanced self-organizing controller (E-SOC) is presented to achieve real time global learning in fuzzy controllers. Direct control is achieved by means of two auxiliary systems: the first one is responsible for adapting the consequents of the main controller's rules to minimize the error arising at the plant output, while the second auxiliary system compiles real input/output data obtained from the plant. The system then learns in real time from these data taking into account, not the current state of the plant but rather the global identification performed. Simulation results show that this approach leads to an enhanced control policy thanks to the global learning performed.
This paper presents a reliable method to obtain the structure of a complete rule-based fuzzy system for a specific approximation accuracy of the training data, i.e. it can decide which input variables must be taken in...
详细信息
This paper presents a reliable method to obtain the structure of a complete rule-based fuzzy system for a specific approximation accuracy of the training data, i.e. it can decide which input variables must be taken into account in the fuzzy system and how many membership functions are needed in every selected input variable in order to reach the approximation target with the minimum number of parameters.
暂无评论