The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complex...
详细信息
The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events that involves machine and network failures. Meld is a checker that allows developers to specify expected message logic on a deployed distributed system, and that verifies these logics while the system is running. When Meld finds a problem it starts collecting more information that led to the problem, allowing developers to quickly find the root cause. Developers write message logics on Meld and Meld verifies them through analyzing the collected abstract of messages. By using binary instrumentation, Meld works almost transparently with debugged systems and can change logics to be checked at runtime. An evaluation with a deployed system shows that Meld can detect non-trivial correctness at runtime.
Virtualization technology improves the resource utilization, but also raises the probability of resource contention. Thus, it arises one problem, namely how about performance isolation among VMs. To answer the questio...
详细信息
Virtualization technology improves the resource utilization, but also raises the probability of resource contention. Thus, it arises one problem, namely how about performance isolation among VMs. To answer the question, benchmarks are highly required to evaluate performance isolation. However, few benchmarks to give solutions to the problem are available. Especially, there does not exist a micro-benchmark to measure how the performance of individual primitive operations varies with the type of misbehavior. In the paper, we present VITS - a micro-benchmark to evaluate the performance isolation of virtualization system. VITS Test Suite can test both software and hardware performance isolation. It is mainly consisted of six interference programs, which respectively test the performance isolation of cache, memory bandwidth, memory space, CPU, network and disk. VITS Test Suite provides results for user to analyze the performance isolation problem of the virtualization system along with the underlying hardware. Besides, we test Xen using VITS test suit. The experimental results show that, the cache, memory and disk performance isolation still have some big weak points to improve, in other words, the current virtualization system Xen is still unfair in fine-grained resource share.
With the popularity of virtualization, the problem that how to manage hundreds even thousands of virtual machines running on multiple physical computing nodes becomes important. Current virtual machine management syst...
详细信息
With the popularity of virtualization, the problem that how to manage hundreds even thousands of virtual machines running on multiple physical computing nodes becomes important. Current virtual machine management systems only can obtain basic information of virtual machines and execute simple operations on them, such as start, reboot and shutdown. In this paper, we design a virtual machine management approach based on agent service. Agent service can provide detail running status information inside virtual machines. It also has been a bridge for host machines and virtual machines to interact with each other. Agent service is designed to automatically start when virtual machine boots up. By agent service we can get real-time information about virtual machines. We evaluate monitoring overhead and the performance of batch operations when using agent. The experimental results show that agent mechanism outperforms methods using Libvirt or VMware tools.
grid is susceptible to a number of software and hardware failures, so a deep understanding of and modeling the grid resource failures are a challenge and have significant influence on grid researching. However, due to...
详细信息
grid is susceptible to a number of software and hardware failures, so a deep understanding of and modeling the grid resource failures are a challenge and have significant influence on grid researching. However, due to various reasons such as commercial secret and security, it is difficult to obtain real historical logs of grids. Therefore, an accurate model of resource failures is critically useful. In the paper, through analyzing the grid log data, we detail the suitability of three potential statistical distributions for each data set: Weibull, Zipf's law and Pareto. Then, this paper develops a grid resource failure simulator. Finally, with the different failure patterns generated by the failure simulator, the paper evaluates several common scheduling algorithms used in gridsystems.
Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in distributed transactional memory system consume too much bandwidth and bring high latency. In this paper, we propose our T...
详细信息
Transactional memory (TM) is a parallel programming concept. Existing consistency protocols in distributed transactional memory system consume too much bandwidth and bring high latency. In this paper, we propose our Transaction Memory Consistency Protocol (TMCP), and point the new features compared to the current protocols. After formulating our model and analyzing the performance, we found both too much and too little execution time will cause more conflicts, given that the execution time of transaction population follows Gamma distribution. We indicate that it is important to adjust the execution time to a reasonable value to improve performance.
Monitoring virtual machine (VM) is an essential function for virtualized platforms. Existing solutions are either coarse-grained - monitoring in granularity of VM level, or not general - only support specific monitori...
详细信息
Monitoring virtual machine (VM) is an essential function for virtualized platforms. Existing solutions are either coarse-grained - monitoring in granularity of VM level, or not general - only support specific monitoring functions for particular guest operating system (OS). Thus they do not satisfy the monitoring requirement in large-scale server cluster such as data center and public cloud platform, where each physical platform runs hundreds of VMs with different guest OSes. In this paper, we propose VMDriver, a general and fine-grained approach for virtualization monitoring. The novel design of VMDriver is the separation of event interception point in VMM level and rich guest OS semantic reconstructions in management domain. With this design, variant monitoring drivers in management domain can mask the differences of guest OSes. We implement VMDriver on Xen and our experimental study shows that it introduces very small performance overhead. We demonstrate its generality by inspecting four aspects information about the target virtual machines with different guest OSes. The unified interface of VMDriver brings convenience to develop complex monitoring tools for distributed virtualization environment.
Warcraft III is one of the most popular Multiplayer Online Game (MOG) games, where users are designed to interact with support of a dedicated server. PKTown is a P2P-based third-party middleware developed to replace t...
详细信息
Warcraft III is one of the most popular Multiplayer Online Game (MOG) games, where users are designed to interact with support of a dedicated server. PKTown is a P2P-based third-party middleware developed to replace the dedicated servers. This paper presents the scalability and robustness of PKTown 2.0 architecture. The evaluation demonstrates the efficiency of the architecture.
Peer-to-peer (P2P) on-demand streaming systems inevitably suffer from peers churn that is the inherent dynamic characteristic of overlay network. With frequent peer departure, a large amount of media data cached on pe...
详细信息
Peer-to-peer (P2P) on-demand streaming systems inevitably suffer from peers churn that is the inherent dynamic characteristic of overlay network. With frequent peer departure, a large amount of media data cached on peer disk turns offline and unavailable, which becomes the major reason of heavy server load. To address the above issue, a new proactive data replication mechanism is proposed and implemented in existing P2P on-demand system GirdCast. Based on the new mechanism, peer can proactively replicate data chunks to stable cache servers for future sharing, when it has high possibility to leave the overlay. Two key heuristic algorithms are designed for departure prediction and replicated chunks selection. system trace driven simulation shows that the mechanism greatly decreases bandwidth load of media source server and improves the availability of chunks highly demanded but poorly provisioned by overlay peers.
When multiple instances of an application running on multiple virtual machines, an interesting problem is how to utilize the fault handling result from one application instance to heal the same fault occurred on other...
详细信息
When multiple instances of an application running on multiple virtual machines, an interesting problem is how to utilize the fault handling result from one application instance to heal the same fault occurred on other sibling instances, and hence to ensure high service availability in a cloud computing environment. This paper presents SHelp, a lightweight runtime system that can survive software failures in the framework of virtual machines. It applies weighted rescue points and error virtualization techniques to effectively make applications by-pass the faulty path. A two-level storage hierarchy is adopted in the rescue point database for applications running on different virtual machines to share error handling information to reduce the redundancy and to more effectively and quickly recover from future faults caused by the same bugs. A Linux prototype is implemented and evaluated using four web server applications that contain various types of bugs. Our experimental results show that SHelp can make server applications to recover from these bugs in just a few seconds with modest performance overhead.
Peer-to-Peer SIP is proposed to leverage Peer-to-Peer computing to control multimedia sessions in a decentralized manner. A lot of companies can be benefited if conventional SIP environment does not require the config...
详细信息
Peer-to-Peer SIP is proposed to leverage Peer-to-Peer computing to control multimedia sessions in a decentralized manner. A lot of companies can be benefited if conventional SIP environment does not require the configuration and maintenance of central servers. However, existing P2PSIP systems put too much emphasis on the decentralization of SIP elements and the briefness of distribution. To some extent, P2PSIP system is manageable, but the control of the entire system is still relatively weak. In this paper, we propose a hierarchical P2PSIP system to address the manageable problems. Actual deployment and exhaustive simulations are performed to evaluate the performance of our P2PSIP schemes. Results indicate that the manageable approach not only solves the control liability problem, but also performs good reliability, scalability, and interoperability.
暂无评论