This paper investigates the problem of Partitioning Skew in MapReduce-based system. Our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew causes a huge amo...
详细信息
This paper investigates the problem of Partitioning Skew in MapReduce-based system. Our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew causes a huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications experience performance degradation due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. We develop a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are partitioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop-0.18.0. Our experiments demonstrate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 40% on different workloads.
Data hiding in computer system is an interesting and important research issue, which brings benefits for secret communication and watermarking. The development of virtual machine brings new potential for data hiding. ...
详细信息
Data hiding in computer system is an interesting and important research issue, which brings benefits for secret communication and watermarking. The development of virtual machine brings new potential for data hiding. In this paper we explore the potential for data hiding in virtual machine disk images, and especially hiding schemes that can be used with copy-on-write images. Besides being considered as a way for valid uses such as secret communication and watermarking, these schemes can be a warning against malicious intentions as well. Furthermore, it lays the foundation for a more thorough analysis of the whole virtual machine system for data hiding.
Agent-based grid data loading method aims at integrating heterogeneous hospitals' information systems into a medical information exchange platform based on grid middleware. It collects distributed data sets for de...
详细信息
Scheduling is the key to divisible workload execution. UMR (uniform multi-round) algorithm potentially performs near optimal by improving overlap of communication and computation. However, it is questioned how a stati...
详细信息
ISBN:
(纸本)9781424445264
Scheduling is the key to divisible workload execution. UMR (uniform multi-round) algorithm potentially performs near optimal by improving overlap of communication and computation. However, it is questioned how a static schedule works effectively in dynamic grid environment. The paper proposes an adaptive divisible workload scheduling system, which can adjust the schedule in a proactive way. An adaptive UMR-based multi-round algorithm (called AUMR) is presented and evaluated. In AUMR, if the run-time resource monitor notifies the scheduler of any resource changes, the scheduler will evaluate its impact and adjust the schedule if necessary. The experiment results show a considerable performance improvement by AUMR in dynamic grid environment.
Binary translation system usually maps guest registers into host registers to accelerate the translation speed. QEMU can be treated as a typical binary translator and it uses a fixed register allocation. On most hosts...
详细信息
ISBN:
(纸本)9781424449095
Binary translation system usually maps guest registers into host registers to accelerate the translation speed. QEMU can be treated as a typical binary translator and it uses a fixed register allocation. On most hosts, QEMU simply maps all the target registers to memory and only store a few temporary variables in host registers. However, QEMU does not consider the dependence of two or more adjacent instructions. So even a guest register's value has been loaded into the temporary variables when executing the previous instruction, the next instruction could not use the value from the temporary variable directly, which is mapped into host register. The next instruction has to reload the value from memory again. This paper presents an approach to eliminate these unnecessary operations. Tests of benchmarks from nbench show that this approach can achieve 10%~20% speed improvement.
With the development of speech recognition, speech data mining becomes a hot topic in fields of data mining and natural language processing. In this paper, a novel clustering algorithm is presented to describe how to ...
详细信息
With the development of speech recognition, speech data mining becomes a hot topic in fields of data mining and natural language processing. In this paper, a novel clustering algorithm is presented to describe how to do semantic mining and how to understand the developing trend of event implied in speech sequence. At first, the speech sequences are extracted into a Bayesian network presenting the relationship between different speech elements. Then, we utilize a 3-dimensional space and sequence cluster techniques to excavate implied information from speech. Considering speech data features, we improve traditional distance-based clustering algorithm to get semantic information and enhance performance. The experimental results show that our algorithm is correct and effective.
This paper provides a new data fusion mechanism based on regulation and reliability to solve the data conflict problems of multi-source heterogeneous data fusion in traffic information engineering. This mechanism eval...
详细信息
This paper provides a new data fusion mechanism based on regulation and reliability to solve the data conflict problems of multi-source heterogeneous data fusion in traffic information engineering. This mechanism evaluates each data source on historical reliability and data source QoS (Quality of Service), and then gives its reliability result. Userspsila reliability on the data source is calculated and dynamically adjusted, then new reliability data of the data source is given, finally the conflict data is fused. The validation of experiment result indicates that the method enhances accuracy and adoption ratio of the collected data in real time greatly.
In this paper, we propose a scheme that manages the computational resource of virtual machines that are used to host high performance computing applications. Different from the static configuration methodology employe...
详细信息
In this paper, we propose a scheme that manages the computational resource of virtual machines that are used to host high performance computing applications. Different from the static configuration methodology employed by the state-of-art virtual machine monitors, in our scheme, the virtual machines are automatically configured according to the actual load generated by the applications. NPB, HPL and kernel compilation are chosen as representative high performance computing applications to run inside the virtual machine constructed using our scheme, and the performance of such applications are compared with that obtained from the statically configured virtual machines. The comparison indicates that besides the great flexibility it brings, the performance penalty resulted by our scheme is below 5% in most cases, and the performance of the application running inside the automatically configured virtual machine is even better than that running inside the statically configured ones in some cases.
As image processing has been widely used in many scientific research areas, how to find a powerful and inexpensive approach to solve large-scale image processing problems poses great challenges. In this paper, based o...
详细信息
As image processing has been widely used in many scientific research areas, how to find a powerful and inexpensive approach to solve large-scale image processing problems poses great challenges. In this paper, based on workflow theory and Web service technology, we present a novel image processing framework called ImageFlow, which enables images to be processed by distributed legacy software coupled with interconnected target systems. The salient features of ImageFlow are: (1) Workflow based image processing; (2) General running service (GRS) for legacy programs; (3) Adaptive data transfer mechanism with SOAP attachment and gridFTP; and (4) Workflow based software deployment. The experimental results show that our approaches are feasible and efficient. SOAP attachment over HTTP is more preferable for transferring small size of data, while gridFTP performs better when transferring large-scale data. In addition, compared with the sequential deployment model, workflow based software deployment can gain higher speedup when constructing large-scale systems.
Based on increasing popularity of cloud computing, social computing and web 2.0 technology, Internet resources are extremely increasing. How to provide service invoking interfaces ceaselessly while minimizing the cost...
详细信息
Based on increasing popularity of cloud computing, social computing and web 2.0 technology, Internet resources are extremely increasing. How to provide service invoking interfaces ceaselessly while minimizing the cost of service development, which can meet the growing needs of end users, becomes a challenging issue for service providers. Meanwhile, service mashup technology is getting more attention in both enterprise and academia for building new end users applications fast in the complex and heterogeneous network environment. Therefore, we propose a new method of service mashup with the advantages of the cloud, grid, web services and other technologies. We develop cloud oriented Service MashUp system prototype (SMU). In SMU, we support the service information interaction, classification, and process during the procedure of service mashup to meet various needs of the multi-level and the multi-role of service applications. Our experiments show that SMU can reach the purpose of building personal service applications quickly and easily.
暂无评论