MapReduce is emerging as an important programming model for data-intensive application. Adapting this model to desktop grid would allow taking advantage of the vast amount of computing power and distributed storage to...
详细信息
MapReduce is emerging as an important programming model for data-intensive application. Adapting this model to desktop grid would allow taking advantage of the vast amount of computing power and distributed storage to execute new range of application able to process enormous amount of data. In 2010, we have presented the first implementation of MapReduce dedicated to Internet Desktop grid based on the BitDew middleware. In this paper, we present new optimizations to BitDew-MapReduce (BitDew-MR): aggressive task backup, intermediate result backup, task re-execution mitigation and network failure hiding. We propose a new experimental framework which emulates key fundamental aspects of Internet Desktop grid. Using the framework, we compare BitDew-MR and the open-source Hadoop middleware on grid5000. Our experimental results show that 1) BitDew-MR successfully passes all the stress-tests of the framework while Hadoop is unable to work in typical wide-area network topology which includes PC hidden behind firewall and NAT; 2) BitDew-MR outperforms Hadoop performances on several aspects: scalability, fairness, resilience to node failures, and network disconnections.
Audio represents one of the most appealing yet least exploited modalities in wireless sensor networks, due to the potentially extremely large data volumes and limited wireless capacity. Therefore, how to effectively c...
详细信息
ISBN:
(纸本)9781467312981
Audio represents one of the most appealing yet least exploited modalities in wireless sensor networks, due to the potentially extremely large data volumes and limited wireless capacity. Therefore, how to effectively collect audio sensing information remains a challenging problem. In this paper, we propose a new paradigm of audio information collection based on the concept of audio-on-demand. We consider a sink-free environment targeting for disaster management, where audio chunks are stored inside the network for retrieval. The difficulty is to guarantee a high search success rate without infrastructure support. To solve the problem, we design a novel replication algorithm that deploys an optimal number of O(√n) replicas across the sensor network. We prove the optimality of the energy consumption of the algorithm, and use real testbed experiments and extensive simulations to evaluate the performance and efficiency of our design. The experimental results show that our design can provide satisfactory quality of audio-on-demand service with short startup latency and slight playback jitter. Extensive simulation results show that this design achieves a search success rate of 98% while reducing the search energy consumption by an order of magnitude compared with existing schemes.
Efficiently operating on relevant data for users in large-scale online social network (OSN) systems is a challenging problem. Storage systems used by popular OSN systems often rely on key-value stores, where randomly ...
详细信息
ISBN:
(纸本)9781467324458
Efficiently operating on relevant data for users in large-scale online social network (OSN) systems is a challenging problem. Storage systems used by popular OSN systems often rely on key-value stores, where randomly partitioning the data of users among servers across the data centers is the defacto standard. Although by using DHTs, the random partition scheme is highly scalable for hosting a large number of users, it leads to costly inter-server communications across data centers due to the complexity of interconnection and interaction between OSN users. In this paper, we explore how to reduce the inter-server communications by retaining the simple and robust nature of OSNs. We propose a data placement solution atop OSN systems to divide users among servers according to the interaction-locality-based structure. Our approach exploits a simple, yet powerful principle of OSN interactions, self-similarity, which reveals that the inter-server communication cost is minimized under such intrinsic structure. Our algorithm avoids a significant amount of inter-server traffic as well as achieves load balance among servers across the data centers. We demonstrate the existence of self-similarity in large-scale Facebook traces including 10 million Facebook users and 24 million interaction events. We conduct comprehensive trace-driven simulations to evaluate this design exploiting the unique feature of self-similarity. Results show that our scheme significantly reduces the traffic and latency of the existing schemes.
When operating in volatile environments, service-based systems (SBSs) that are built through dynamic composition of component services must be monitored in order to guarantee the response times of the SBSs. In particu...
详细信息
When operating in volatile environments, service-based systems (SBSs) that are built through dynamic composition of component services must be monitored in order to guarantee the response times of the SBSs. In particular, the critical path of a composite SBS, i.e., the execution path in the service composition with the maximum execution time, should be prioritised in cost-effective monitoring as it determines the response time of the SBS. In volatile operating environments, the critical path of a SBS is probabilistic. As such, it is important to estimate the criticalities of the execution paths and the component services, i.e., the probabilities that they are critical, to decide which parts of the system to monitor. In this paper, we propose a novel approach to the identification of Probabilistic Critical Path for Service-based systems (PCP-SBS). PCP-SBS takes into account the probabilistic nature of the critical path and calculates path criticalities in the context of service composition. We evaluate PCP-SBS experimentally using SBSs that are synthetically composed based on a real-world Web service dataset.
Desktop virtualization is a very hot concept in both industry and academic communities. Since virtualized desktop system is based on multiple virtual machines (VM), it is necessary to design a distributed storage syst...
详细信息
Desktop virtualization is a very hot concept in both industry and academic communities. Since virtualized desktop system is based on multiple virtual machines (VM), it is necessary to design a distributed storage system to manage the VM images. In this paper, we design a distributed storage system, VMStore, by taking into account three important characteristics: high performance VM snapshot, booting optimization from multiple images and redundancy removal of images data. We adopt a direct index structure of blocks for VM snapshots to speed up VM booting performance significantly; provide a distribute storage structure with good bandwidth scalability by dynamically changing the number of storage nodes; and propose a data preprocessing strategy with intelligent object partitioning techniques, which would eliminate duplication more effectively. Performance analysis for VMStore focuses on two metrics: the speedup of VM booting and the overhead of de-duplication. Experimental results show the efficiency and effectiveness of VMStore.
Recently much attention has been paid to semantic overlay networks for information retrieval in large scale peer-to-peer networks,and much research work on semantic overlay protocols and searching algorithms has been ...
详细信息
Recently much attention has been paid to semantic overlay networks for information retrieval in large scale peer-to-peer networks,and much research work on semantic overlay protocols and searching algorithms has been done and the results indicate that semantic overlay is efficient for content searching in peer-to-peer ***,very limited work has been done to analyze and evaluate the characteristics of semantic overlay *** this paper we identify a natural property of semantic overlay networks,the community *** propose a mathematical model to evaluate the property of community structure of semantic P2P overlay networks.A heuristic algorithm is designed to optimize the community *** the evaluation model we compare the SemreX semantic overlay with the Gnutella *** demonstrate that a SemreX overlay network has the distinctive community structure feature,while a Gnutella-like network does *** also simulate a simple flooding protocol in both overlays to show that the overlay with community structure is more efficient for content searching.
Currently there is no practical standard for grid middleware, most of the grid platforms are built by their own, and it's not easy to interoperate these grid platforms. Information service is one of the key compon...
详细信息
Bibliographical information of scientific papers is of great value since the Science Citation Index is introduced to measure research impact. Most scientific documents available on the web are unstructured or semi-str...
详细信息
Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and traditional hashing are two common techniques used for managing...
详细信息
Virtualization can provide significant benefits in data centers, such as dynamic resource configuration, live virtual machine migration. services are deployed in virtual machines (VMs) and resource utilization can be ...
详细信息
暂无评论