This paper presents a methodology for high-level power modeling of cell-based processors. A flexible power model library, which can automatically generate detailed power data for actual circuits of each part of given ...
详细信息
This paper presents a methodology for high-level power modeling of cell-based processors. A flexible power model library, which can automatically generate detailed power data for actual circuits of each part of given processor, is developed and annotated dynamically for architecture-level power simulator. According to this method, the dynamic power, leakage power and even area and cell counts can be accurately estimated, and the preliminary power validation for a MIPS microprocessor proves our methodology to be effective and highly correlated, with only small errors comparing with the gate-level power analysis.
As the scale of parallel machine grows, communication network is playing more important role than ever before. Communication affects not only execution time, but also scalability of parallel applications. Parallel int...
详细信息
ISBN:
(纸本)9781617386404
As the scale of parallel machine grows, communication network is playing more important role than ever before. Communication affects not only execution time, but also scalability of parallel applications. Parallel interconnection network simulator is a suitable tool to study large-scale interconnection networks. However, simulating packet level communication on detailed cycle-to-cycle network models is a really challenge work. We implement a kernel-based parallel simulator HPPNetSim to solve problems. Optimistic PDES mechanism needs huge memory consumption of saving simulation entities' states in large-scale simulations, so we chose conservative synchronization approach. Simulation kernel and network models are all carefully designed. To accelerate process of simulation, optimizations are introduced, such as block/unblock synchronization, load balancing, dynamic look-ahead generation, and etc. Simulation examples and performance results show that both high accuracy and good performance are obtained in HPPNetSim. It achieves speedup of 19.8 for 32 processing nodes when simulating 36-port 3-tree fat-tree network.
For a gigahertz microprocessor with multiple clock domains and a large amount of embedded RAMs (Random Access Memory), generating at-speed testing patterns is becoming very difficult and very time-consuming. This pape...
详细信息
As the scale of parallel machine grows, communication network is playing more important role than ever before. Communication affects not only execution time, but also scalability of parallel applications. Parallel int...
详细信息
As the scale of parallel machine grows, communication network is playing more important role than ever before. Communication affects not only execution time, but also scalability of parallel applications. Parallel interconnection network simulator is a suitable tool to study large-scale in-terconnection networks. However, simulating packet level communication on detailed cycle-to-cycle network models is a really challenge work. We implement a kernel-based parallel simulator HPPNetSim to solve problems. Optimistic PDES mechanism needs huge memory consumption of sav-ing simulation entities' states in large-scale simulations, so we chose conservative synchronization approach. Simula-tion kernel and network models are all carefully designed. To accelerate process of simulation, optimizations are in-troduced, such as block/unblock synchronization, load balancing, dynamic look-ahead generation, and etc. Simula-tion examples and performance results show that both high accuracy and good performance are obtained in HPPNetSim. It achieves speedup of 19.8 for 32 processing nodes when simulating 36-port 3-tree fat-tree network.
Bugs are tending to be unavoidable in the design of complex integrated circuits. It is imperative to identify the bugs as soon as possible by post-silicon debug. The main challenge for post-silicon debug is the observ...
详细信息
ISBN:
(纸本)9781424437696
Bugs are tending to be unavoidable in the design of complex integrated circuits. It is imperative to identify the bugs as soon as possible by post-silicon debug. The main challenge for post-silicon debug is the observability of the internal signals. This paper exploits the fact that it is not necessary to observe the error free states. Then we introduce "suspect window" and present a method for determining its boundary. Based on suspect window, we propose a debug approach to achieve high observability by reusing scan chain. Since scan dumps take place only in suspect window, debug time is greatly reduced. Experimental results demonstrate the effectiveness of the proposed approach.
Mining Frequent Items from sensory data is a major research problem in wireless sensor networks(WSNs) and it can be widely used in environmental monitoring. Conventional Lossy Counting algorithm can be applied to solv...
详细信息
Dixon resultant is a basic elimination method which has been used widely in the high technology fields of automatic control, robotics, etc. But how to remove extraneous factors in Dixon resultants has been a very diff...
详细信息
Dixon resultant is a basic elimination method which has been used widely in the high technology fields of automatic control, robotics, etc. But how to remove extraneous factors in Dixon resultants has been a very difficult problem. In this paper, we discover some extraneous factors by expressing the Dixon resultant in a linear combination of original polynomial system. Furthermore, it has been proved that the factors mentioned above include three parts which come from Dixon derived polynomials, Dixon matrix and the resulting resultant expression by substituting Dixon derived polynomials respectively.
Virtual machine technology has played an important role in data center. Distributed services deployed in multiple virtual machines, may reside on one physical machine. This situation requires an efficient inter-domain...
详细信息
With virtual machine technology, distributed services deployed in multiple cooperative virtual machines, such as multi-tier web services, may reside on one physical machine. This situation requires an efficient inter-...
详细信息
Multicore architecture is becoming a promise to keep Moore's Law and brings a revolution in both research and industry which results new design space for software and architecture. Fast Fourier transform (FFT), co...
详细信息
Multicore architecture is becoming a promise to keep Moore's Law and brings a revolution in both research and industry which results new design space for software and architecture. Fast Fourier transform (FFT), computing intensive and bandwidth intensive, is one of the most popular and important applications in the world. Compared with the computing resource on multicore architecture, the on-chip memory resource is much more expensive because of the limitation of physical chip size. Efficient implementation of FFT algorithm on multicore with good scalability is a challenge for both software and hardware developers. In this paper, supported by the Godson-T architecture, an optimized implementation of 1-D FFT has been developed with matrix transpose conceal and computation/communication overlapping, which achieve more than 30% performance improvement as well as almost 1/3 L2 cache consumption reduce comparing with the base six-step FFT. The limitation of scalability is also analyzed and the conclusion is that on Godson-T when frequency and simultaneous data access happen, the limited access bandwidth of L2 cache is the bottleneck and result in the longer on-chip network latency.
暂无评论