We address the optimal sink scheduling problem in wireless sensor networks (WSNs). The problem is inherently difficult since sink scheduling and data routing are tightly coupled. Previous approaches either have questi...
详细信息
We address the optimal sink scheduling problem in wireless sensor networks (WSNs). The problem is inherently difficult since sink scheduling and data routing are tightly coupled. Previous approaches either have questionable performance due to no joint considerations, or are based on relaxed constraints. Our aim is to fill in this blank in the research. First, by discretizing continuous time, we develop a novel bound technique to connect time-varying routes with the placement of sinks. This bounding technique transforms time-related constraints into pattern-based ones and allows us to mathematically formulate this optimization in a pattern-based way. The complexity of directly solving this optimization is intractable; therefore, on the basis of column generation (CG), a computationally efficient algorithm is developed to reduce the complexity by decomposing the problem into sub-problems and iteratively solving them to approach optimality. Simulations demonstrate the efficiency of the algorithm and substantiate the importance of sink mobility in energy-constrained sensor networks.
An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other ha...
详细信息
An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connectstwo kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intranode layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, aspecial collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient.
Sink Scheduling, in the form of scheduling multiple sinks among sink sites to leverage traffic burden, is an effective mechanism for the energy-efficiency of wireless sensor networks (WSNs). Due to the inherent diffic...
详细信息
ISBN:
(纸本)9781457702495
Sink Scheduling, in the form of scheduling multiple sinks among sink sites to leverage traffic burden, is an effective mechanism for the energy-efficiency of wireless sensor networks (WSNs). Due to the inherent difficulty (NP-hard in general), existing works on this topic mainly focus on heuristic/greedy algorithms and theoretic results remain unknown. In this paper, we fill in the research blank with two algorithms. The first one is based on the Column Generation (CG). It decomposes the original problem into two sub problems and solve them iteratively to approach the optimal solution. However, due to its high computational complexity, this algorithm is only suitable for small scale networks. The other one is a polynomial-time algorithm based on relaxation techniques to obtain an upperbound, which can serve as a performance benchmark for other algorithms on this problem. Through comprehensive simulations, we evaluate the efficiency of proposed algorithms.
Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, ...
详细信息
As one of the most important enabling technologies of cloud computing, virtualization brings to HPC good manageability, online system maintenance, performance isolation and fault isolation. Furthermore, previous study...
详细信息
Moore's law will grant computer architects ever more transistors for the foreseeable future, and the challenge is how to use them to deliver efficient performance and flexible programmability. We propose a many-core ...
详细信息
Moore's law will grant computer architects ever more transistors for the foreseeable future, and the challenge is how to use them to deliver efficient performance and flexible programmability. We propose a many-core architecture, Godson- T, to attack this challenge. On the one hand, Godson-T features a region-based cache coherence protocol, asynchronous data transfer agents and hardware-supported synchronization mechanisms, to provide full potential for the high efficiency of the on-chip resource utilization. On the other hand, Godson-T features a highly efficient runtime system, a Pthreadslike programming model, and versatile parallel libraries, which make this many-core design flexibly programmable. This hardware/software cooperating design methodology bridges the high-end computing with mass programmers. Experimental evaluations are conducted on a cycle-accurate simulator of Godson-T. The results show that the proposed architecture has good scalability, fast synchronization, high computational efficiency, and flexible programmability.
The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coh...
详细信息
The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coherence for shared memory many-core architectures. The unique feature of our scheme is that it doesnpsilat use directory at all. Inspired by scope consistency memory model, our protocol maintains coherence at synchronization point. Within critical section, processor cores record write-sets (which lines have been written in critical section) with bloom-filter function. When the core releases the lock, the write-set is transferred to a synchronization manager. When another core acquires the same lock, it gets the write-set from the synchronization manager and invalidates stale data in its local cache. Experimental results show that the SBC outperforms by averages of 5% in execution time across a suite of scientific applications. At the mean time, the SBC is more cost-effective comparing to directory-based protocol that requires large amount of hardware resource and huge design verification effort.
The network file system (NFS) protocol, as the de facto standard for sharing files in a distributed environment, has deployed Infiniband as the underlying transport of sunRPC, namely NFS over RDMA. In the current Read...
详细信息
The network file system (NFS) protocol, as the de facto standard for sharing files in a distributed environment, has deployed Infiniband as the underlying transport of sunRPC, namely NFS over RDMA. In the current Read-Write design of NFS over RDMA, NFS write performance is limited for not fully utilizing the features of Infiniband. In this paper, we take on the challenge of enhancing the write performance of NFS. We propose and evaluate a new design of sunRPC over RDMA, namely Write-Write design. To guarantee the security of our design, we propose an HCA-based memory protection extension of Infiniband. Evaluations show that our Write-Write design increases the kernel-to-kernel RPC bandwidth by 15~27%. In real disk test, our Write-Write design gains 15%~22% in multi-client benchmarks compared with the Read-Write design.
The many-core architecture is increasingly becoming a promising computing platform due to the advancement of semi-conductor technology. LU decomposition is a widely used kernel in both scientific and engineering compu...
详细信息
The many-core architecture is increasingly becoming a promising computing platform due to the advancement of semi-conductor technology. LU decomposition is a widely used kernel in both scientific and engineering computations. Although there are a lot of related works on traditional parallel architectures, there is still little work focusing on parallelizing it on many-core architectures. This paper investigates this problem from three aspects: load balancing, latency hiding and performance modeling. There are three contributions of this work: Firstly, a novel load balancing technique has been introduced to overcome the limitations of 2D scatter decomposition. Experimental results show that the proposed scheme achieves 20% performance improvement without optimization and 40% improvement after optimization. Secondly, an analytical performance model is presented. Quantitative experimental study shows that by carefully hiding memory latency through on chip memory hierarchy and for a selected block size, the upper bound of theoretical performance can be approximated by experiments. Experimental results also reveal two primary causes which make theoretical speedup hard to achieve: limited DRAM bandwidth and resource contention of on-chip network.
Heterogeneity is considered as a solution for supercomputers to scale to petascale. Many systems which are composed of general CPUs and special processing units such as Cells, GPGPUs and FPGAs have been implemented. I...
详细信息
Heterogeneity is considered as a solution for supercomputers to scale to petascale. Many systems which are composed of general CPUs and special processing units such as Cells, GPGPUs and FPGAs have been implemented. In these systems, CPU needs interact with special processing units to process data together, thus communications between these heterogeneous processing units become a key problem, and the communication subsytem should provide low latency and high bandwidth. In this paper, we propose HPP-Controller, which is designed for connecting two different types of CPUs (AMD and Loongson) in one node. It connects heterogeneous CPUs on top of no-coherent HyperTransport (HT) fabric and supports Global Physical Address Space. We implement a FPGA-based prototype and evaluate it via experiments. Initial results show that HPP-Controller has low latency of 0.75 us and high bandwidth close to bandwith of HT links.
暂无评论