Network emulation environment is great importance to the research of network protocols, applications and security mechanism. Large-scale network topology generation is one of key technologies to construct network emul...
详细信息
The networked application environment has motivated the development of multitasking operating systems for sensor networks and other low-power electronic devices, but their multitasking capability is severely limited b...
详细信息
ISBN:
(纸本)9781424472611;9780769540597
The networked application environment has motivated the development of multitasking operating systems for sensor networks and other low-power electronic devices, but their multitasking capability is severely limited because traditional stack management techniques perform poorly on small-memory systems. In this paper, we show that combining binary translation and a new kernel runtime can lead to efficient OS designs on resource-constrained platforms. We introduce SenSmart, a multitasking OS for sensor networks, and present new OS design techniques for supporting preemptive multi-task scheduling, memory isolation, and versatile stack management. We have implemented SenSmart on MICA2/MICAz motes. Evaluation shows that SenSmart performs efficient binary translation and demonstrates a significantly better capability in managing concurrent tasks than other sensornet operating systems.
Event-driven programming has been a relatively hot topic in distributed systems development. Having worked on these systems for years, we now believe that it is not the best choice. Besides the wellknown "stack r...
详细信息
Event-driven programming has been a relatively hot topic in distributed systems development. Having worked on these systems for years, we now believe that it is not the best choice. Besides the wellknown "stack ripping" problem, we argue that it greatly influences the composability of software modules. Preemptive threads are also short of composability because of data-races and locks. Lacking of composability can result in systems with little vitality. Cooperative threading (or coroutine), on the contrary, is almost free of this problem, so we advocate it as the primary concurrency model for most distributed systems.
Insects build architecturally complex nests and search for remote food by collaboration work despite their limited sensors, minimal individual intelligence and the lack of a central control system. Insets' co...
详细信息
ISBN:
(纸本)9781424472796
Insects build architecturally complex nests and search for remote food by collaboration work despite their limited sensors, minimal individual intelligence and the lack of a central control system. Insets' collaborations emerge as a response of the individual insects to Stigmergy. A sign-based model of Stigmergy to discuss collaboration is proposed in this paper where we picked up "sign" as a key notion to understand it. Therefore, sign is the link of all the components in a Stigmergic complex adaptive system. Based on this understanding, we propose a definition that reveals the nature of signs and exploit the significations and relationships carried by the notion of sign. Then, a sign-based model of Stigmergy is consequently reached, which captures the essentials of Stigmergy. A basic architecture of Stigmergy as well as its constituents are presented and discussed. At last, some applications of the model are discussed.
Successive interference cancellation (SIC) is an effective technique of multipacket reception to combat interference. As not all collision are resolvable, careful transmission coordination is required. We study link s...
详细信息
Successive interference cancellation (SIC) is an effective technique of multipacket reception to combat interference. As not all collision are resolvable, careful transmission coordination is required. We study link scheduling in wireless networks with SIC at the physical layer. A new model, simultaneity graph (SG), is proposed to characterize the link correlation introduced by SIC. Then two new scheduling schemes are presented: 1) a slot-oriented scheme which assigns a maximal feasible link set to a time slot and 2) a link-oriented scheme which assigns each link a sufficient number of slots. The performance is evaluated by simulations and the results demonstrate that the throughput gain is on average 50% and up to 110% over IEEE 802.11. The complexity of SG is only a bit higher than that of the available widely-used models (e.g., conflict graph).
Successive interference cancellation (SIC) is an effective way of multipacket reception to combat interference. We study link scheduling under SINR (Signal to Interference Noise Ratio) model in ad hoc networks with SI...
详细信息
Successive interference cancellation (SIC) is an effective way of multipacket reception to combat interference. We study link scheduling under SINR (Signal to Interference Noise Ratio) model in ad hoc networks with SIC at physical layer. The facts that interference is accumulated and the links decoded sequentially by SIC are correlated pose key technical challenges. We propose conflict set graph (CSG) to characterize the interference and define interference degree to measure the interference of a link. As scheduling over CSG is NP-hard, independent set based greedy scheme is explored to efficiently construct maximal feasible schedule. The performance is evaluated by simulations. As compared to the simple greedy method, the throughput gain is on average 30% and up to 60%.
With fast development of transistor technology, Graphic processing Unit(GPU) is increasingly used in the non-graphics applications, and major GPU hardware vendors have introduced software stacks for their own GPUs, su...
详细信息
ISBN:
(纸本)9781424456789;9780769539584
With fast development of transistor technology, Graphic processing Unit(GPU) is increasingly used in the non-graphics applications, and major GPU hardware vendors have introduced software stacks for their own GPUs, such as Brook+ for AMD GPU. Compared with the traditional parallel systems, heterogeneous systems integerating stream-based multi-threaded GPUs provide higher parallel computing capabilities with lower cost. However, porting traditional applications to the heterogeneous systems makes new demand of application optimization on GPU. Based on the AMD's Brook+ platform, we explored application optimization features on AMD GPU by optimizing and implementing the benchmark LBM from SPEC2006. To improve the program locality, we optimized the original data layout of LBM. Using the short vector data types mechanism provided by Brook+, we also optimized the GPU's bandwidth utilization and its thread processors' efficiency. Through the branch elimination technique, we reduced the performance lose caused by branch divergences in the kernel, which is due to the GPU's SIMD executing mode. The experiment results show that data layout, memory bandwidth, branch paths and other factors have a close effect on the performance of program execution on the GPU. Through all the optimizations, we finally got a speedup of 22x (single-precision) and 19x (double-precision) over the original serial benchmark code on a Quad-core CPU, and a speedup of 4x (single-precision) and 8.7x (double-precision) over the original OMP benchmark code on a 8-core CPU.
This paper quantitatively studies the trace effects to the performance and accuracy of the BigSim Emulator, a scalable parallel emulator for large-scale computers. To assess the accuracy effect we modify the emulator ...
详细信息
This paper quantitatively studies the trace effects to the performance and accuracy of the BigSim Emulator, a scalable parallel emulator for large-scale computers. To assess the accuracy effect we modify the emulator code to collect the predicted computation time. Four MPI programs with different computation to communication ratios are used as benchmarks. The emulation time and the predicted computation time, both when trace generation are enabled and disabled, are collected on two parallel host machines. The results show that although the BigSim Emulator only traces communication events and dependencies, trace generation still evidently degrades the emulation performance for programs with high communication to computation ratios. Trace generation also significantly affects the accuracy of the predicted computation time for communication intensive programs, which is an issue that can not be overlooked.
Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPU-G...
详细信息
Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPU-GPU heterogeneous platform. In this paper, we choose SWIM, a typical memory intensive application from the SPEC OMP 2001 benchmark suite, for case study. We attempt to optimize the performance and energy consumption of the application utilizing different memory access mechanisms and present optimization methods including matrix transposition and kernel fusion. The experimental results on the Intel Core TM i920 CPU plus GeForce GTX 295 platform shows that, the proposed optimizing methods achieve a speedup of 8.7X over the original OpenMP program and reduce the energy consumption by 83% for the problem size of 2048*2048.
暂无评论