Single-electronic transistors (SETs) are considered as the attractive candidates for post-COMS VLSI due to their ultra-small size and low power consumption. Because SETs with single island can not work at room tempera...
详细信息
ISBN:
(纸本)9781424448326
Single-electronic transistors (SETs) are considered as the attractive candidates for post-COMS VLSI due to their ultra-small size and low power consumption. Because SETs with single island can not work at room temperature normally, more and more researchers begin to make research on the SETs with 1-dimension multi-islands. Based on the Monte Carlo (MC) and stable diagram method, the critical problems during simulating and analyzing the SETs with multi-islands are investigated, e.g. capacitance matrix, potential of the islands, the free energy and so on. Meanwhile, double-island SET is simulated and analyzed as an example of SETs with 1-D muti-islands in detail. For the first time, 3-D stable diagram is obtained, and some new phenomena are analyzed in the paper. And through analyzing the result it's shown that although SETs with 1-D muti-islands is more useful and powerful than SETs with single island, there inevitably exists many effects because of the coupling energy between the islands.
Combining virtual machine technology and network computing technology will be able to effectively aggregate the widely distributed heterogeneous and autonomous resources in the Internet. This paper proposes a virtual ...
详细信息
Combining virtual machine technology and network computing technology will be able to effectively aggregate the widely distributed heterogeneous and autonomous resources in the Internet. This paper proposes a virtual machine server aggregation algorithm, called DVSA, based on hierarchical clustering method for virtual computing environment. According to network latencies, the algorithm clusters virtual machine servers into groups. If servers in the same group are scheduled, the latencies between virtual machines which host on these servers will be small, and the performance and stability of the distributed execution environment could be improved.
Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the hyperbolic space to model t...
详细信息
Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the hyperbolic space to model the Internet dense-core and many-tendril structure. However, existing hyperbolic space based embedding approaches are not designed for accurate latency estimation in the distributed context. We present HyperSpring, which estimates latency by modelling a mass spring system in the hyperbolic similar with Vivaldi. HyperSpring adopts coordinate initialization to speed up the convergence of coordinate computation, uses multiple-round symmetric updates to escape from bad local minima, and stabilizes coordinates by compensating RTT measurements to reduce the coordinate drifts. Evaluation results based on a network trace of 226 PlanetLab nodes indicate that, compared to Euclidean-space based Vivaldi, hyperspring provides performance improvements for most nodes, and incurs slightly higher distortions for a small number of nodes.
The modeling and simulation on evacuation has recently become a topic of great interest. We present an agent-based model to construct crowd evacuations for emergency response from an area under an explosion. Various t...
详细信息
The modeling and simulation on evacuation has recently become a topic of great interest. We present an agent-based model to construct crowd evacuations for emergency response from an area under an explosion. Various types of agents are designed as well as the interactions of them are concerned in contrast to traditional models in which the total populations are considered to be consisted of identical individuals and the interactions between them are omitted. Different cases are taken into account to test the effect of our model by iterative simulations. At last, plenty simulation results suggest several effective ways to minimize the harmful consequences of such life-threaten events.
Virtual network is an important approach to support multiple legacy applications running unmodified in distributed virtual computing environments. A virtual networking approach called VirNet is proposed in this paper....
详细信息
Virtual network is an important approach to support multiple legacy applications running unmodified in distributed virtual computing environments. A virtual networking approach called VirNet is proposed in this paper. VirNet can build multiple customized and isolated virtual networks simultaneously on the same physical hosts in the network, and it enables existing distributed applications written for LANs to run seamlessly on Internet. VirNet is designed completely in user space and requires no change to the kernel of operating systems. Experimental evaluation performed by a reference implementation shows that VirNet is efficient. The latency overhead caused by VirNet is very small and the bandwidth of VirNet is more than 85% of the available physical network bandwidth in both the LAN and emulated WAN environments.
Graphic processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not...
详细信息
Graphic processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not be fully utilized for memory-intensive applications, which are limited by off-chip memory bandwidth and latency. Stencil computation has abundant parallelism and low computational intensity which make it a useful architectural evaluation benchmark. In this paper, we propose some memory optimizations for a stencil based application mgrid from SPEC 2 K benchmarks. Through exploiting data locality in 3-level memory hierarchies and tuning the thread granularity, we reduce the pressure on the off-chip memory bandwidth. To hide the long off-chip memory access latency, we further prefetch data during computation through double-buffer. In order to fully exploit the CPU-GPU heterogeneous system, we redistribute the computation between these two computing resource. Through all these optimizations, we gain 24.2 x speedup compared to the simple mapping version, and get as high as 34.3 x speedup when compared with a CPU implementation.
Graphic processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide the...
详细信息
Graphic processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. As the floating-point computing capacity increases continually, the problem of ``memory-wall'' becomes more serious, especially for array-intensive applications. In this paper, we optimize and implement two SPEC2k benchmarks mgrid and swim on multithreaded GPU using CUDA and Brook+. In order to reduce the pressure on off-chip memory, we make use of data locality in multi-level memory hierarchies and hide long memory access latency via double-buffers. To balance inter-thread parallelism and intra-thread locality, we further tune thread granularity for each kernel and empirically study the best equilibrium point for this problem. Flow control instruction can significantly impact the effective instruction throughput. Oriented to this problem, we introduce a diverge elimination technology to convert condition expression into computing operation. Through all the optimizations, we gain the speedup of 10×-34× to the CPU implementation on the GPUs of AMD and NVIDIA respectively. Finally, we summarize and compares the GPUs from AMD and NVIDIA in hardware and software.
The memory wall problem makes researches on memory hierarchy more and more important. And in these researches, software simulation plays a significant role. To model the real runtime environment, full system simulator...
详细信息
The memory wall problem makes researches on memory hierarchy more and more important. And in these researches, software simulation plays a significant role. To model the real runtime environment, full system simulators have been widely used. However, researchers are often perplexed by the precise problem when they run a parallel program on a full system multi-core simulator, especially in multiple configurations. Because the influence arising from operating system and the relative execution speed of parallel threads often make the performance of target programs uncertain. To solve the precise problem, this paper proposes a single execution multi-configuration simulation framework (SEMCS). Then we design and implement a Simics module, called trans-multicast using the SEMCS framework. Finally, we test a benchmark from SPEComp2001 on a four core processor in three memory hierarchy configurations. And the result has verified the effectiveness of the SEMCS framework.
The low efficiency and uncertainty routing is correct is a problem for resource location of an unstructured p2p network. It is hard to achieve high query hit with small cost and low latency for such kind of network. I...
详细信息
The low efficiency and uncertainty routing is correct is a problem for resource location of an unstructured p2p network. It is hard to achieve high query hit with small cost and low latency for such kind of network. In this paper, we present a p2p query routing algorithm which is based on semantic cluster (SCQR). SCQR makes nodes clustered according to their semantic, and each cluster elects a super-node as cluster computing node which is responsible for computing cluster semantic and establishing links with all neighbor cluster computing nodes. Query is routed among cluster computing nodes. SCQR achieves high query hit with small routing latency and query cost in both analytical theoretical and experimental studies.
In distributed virtual environment (DVE) systems, a distributed server infrastructure is often used to reduce the latency between servers and clients. Under this infrastructure, mapping clients to proper servers is on...
详细信息
In distributed virtual environment (DVE) systems, a distributed server infrastructure is often used to reduce the latency between servers and clients. Under this infrastructure, mapping clients to proper servers is one of the key issues for improving the interactivity and overall performance. Most traditional methods of mapping the clients to servers only consider the load balancing problem. However, there are two other important aspects that should be involved: the physical world integrity and the virtual world integrity. In this work, we propose a novel mapping algorithm which takes care of all three aspects at the same time. The algorithm converts the mapping problem into cutting stage and matching stage to get optimal result with polynomial complexity. The experimental results show that our algorithm improves the overall performance of DVE systems significantly.
暂无评论