Power consumption, design complexity and areacost are limiting constraints in the design of interconnect for scalable many-core systems. To tackle the power and area concerns, we propose a light-weight unidirectional ...
详细信息
ISBN:
(纸本)9781479974375
Power consumption, design complexity and areacost are limiting constraints in the design of interconnect for scalable many-core systems. To tackle the power and area concerns, we propose a light-weight unidirectional channel network-on-chip in 2D mesh topology (UniMESH), which simplifies router architectures, uses only half amount of channel links to guarantee a fully connected topology, and adopts a novel routing algorithm and deadlock recovery mechanism. As a result, it can reduce both design complexity and area-cost, and decrease some unwanted power consumption. Evaluations show that the proposed light-weight UniMESH can reduce 57.4% router areas, and save 39.3% total power consumption and only add few extra latency when compared with conventional 2D mesh design in SPLASH application simulations.
In recent years, many companies are embracing the Hadoop MapReduce system for large-data processing with completion time constrains. However, exiting Hadoop schedulers still suffer from the reducer load imbalancing pr...
详细信息
ISBN:
(纸本)9781467381741
In recent years, many companies are embracing the Hadoop MapReduce system for large-data processing with completion time constrains. However, exiting Hadoop schedulers still suffer from the reducer load imbalancing problem. In this paper, we present a novel run-time load balancing method for MapReduce. Our approach predicts the workload of each reduce task at run-time, and assigns the reduce tasks to specified machines based on the estimated workload of reduce tasks dynamically. Therefore, our approach can achieve load balance among machines. The experimental results show that our approach achieves high accuracy while predicting the workload of reduce tasks, and improves the job completion time by up to 23.15%.
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design...
详细信息
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
Recent years,the hardening of combinational circuits is becoming a common *** the transistor-level hardening technique,the cell-level hardening technique,a divide and conquer strategy,can substantially make use of som...
详细信息
Recent years,the hardening of combinational circuits is becoming a common *** the transistor-level hardening technique,the cell-level hardening technique,a divide and conquer strategy,can substantially make use of some typical character in the cell-circuit module to mitigate single event transient(SET)*** mirror image(MI)technique proposed in this paper can adequately enhance the charge sharing in those cell-circuits with stage-by-stage inverter-like structure.3D TCAD mixed-mode simulation have been performed in 65 nm twinwell bulk CMOS process,the results indicate that the MI technique can almost reduce the SET pulse width from the anterior-stage PMOS over 25%,and can mitigate the SET pulse width from the posterior-stage PMOS about 10%.The MI technique,a represent of the cell-level technique,may be the future of the hardening of combinational circuits.
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interpr...
详细信息
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.
On the 41st Top500 list announced in June 2013, the MilkyWay-2 system produced by National University of Defense Technology (NUDT) in China won the first place with a LINPACK test result of 33.86 PFLOPS. It has been...
On the 41st Top500 list announced in June 2013, the MilkyWay-2 system produced by National University of Defense Technology (NUDT) in China won the first place with a LINPACK test result of 33.86 PFLOPS. It has been one and a half year since its predecessor, MilkyWay-1 (TH-1), reached the same place for the first time. On the newest Top500 list published in November 2013, MilkyWay-2 continued to win the champion.
Due to the uncertainty and unpredictability of environment changes, it is a great challenge to develop self-adaptive systems in open environment. First, it is difficult for developers to clearly predict various enviro...
详细信息
It is shown by particle-in-cell simulations that a narrow electron beam with high energy and charge density can be generated in a subcritical-density plasma by two consecutive laser pulses. Although the first laser pu...
详细信息
It is shown by particle-in-cell simulations that a narrow electron beam with high energy and charge density can be generated in a subcritical-density plasma by two consecutive laser pulses. Although the first laser pulse dissipates rapidly, the second pulse can propagate for a long distance in the thin wake channel created by the first pulse and can further accelerate the preaccelerated electrons therein. Given that the second pulse also self-focuses, the resulting electron beam has a narrow waist and high charge and energy densities. Such beams are useful for enhancing the target-back space-charge field in target normal sheath acceleration of ions and bremsstrahlung sources, among others.
Multiple-input and multiple-output (MIMO) is an important approach in high-rate wireless communications. The Schnorr-Euchner (SE) sphere-decoding algorithm enables fast detection for receivers by recursive tree search...
详细信息
Activity recognition has broad application prospects in many fields including pervasive computing and human-computer interaction. In this paper, the technology of wireless-based activity recognition is introduced. By ...
详细信息
暂无评论