Nowadays, it is an important trend in the system domain to use the software-based virtualization technology to build the execution environments (e.g., Clouds) and serve high performance computing (HPC) applications. H...
详细信息
Nowadays, it is an important trend in the system domain to use the software-based virtualization technology to build the execution environments (e.g., Clouds) and serve high performance computing (HPC) applications. H...
详细信息
Nowadays, it is an important trend in the system domain to use the software-based virtualization technology to build the execution environments (e.g., Clouds) and serve high performance computing (HPC) applications. However, with the extra virtualization layer, the application performance may be negatively affected. Studies revealed that the communication performance of the MPI library, which is widely used by the HPC applications, would suffer a high penalty when a physical host machine becomes overcommitted by virtual processors (VCPU). Unfortunately, the problem has not received enough attention and has not been solved yet in literature. In this paper, we investigate the reasons behind the performance penalty, and propose a solution to improve the communication performance of running MPI applications in the overcommitted virtualized systems. The experimental results show that by our proposal, most HPC applications can gain performance improvement to different extents among the overcommitted systems, depending on their communication patterns and the over committing level.
Virtual machine (VM) interference has long been a challenging problem for performance predictability and system throughput for large-scale virtualized environments in the cloud. Such interferences are contributed by i...
详细信息
Virtual machine (VM) interference has long been a challenging problem for performance predictability and system throughput for large-scale virtualized environments in the cloud. Such interferences are contributed by intertwined factors including the application's type, the number of con current VMs, and the VM scheduling algorithms used within the host. Since MapReduce has become an important data processing platform in the cloud, we investigate the impact of disk schedulers in Hadoop. Interestingly, our experimental results report a noticeable variation of the Hadoop performance between different applications when applying different disk pairs' schedulers in both the hypervisor and the virtual machines. Furthermore, a typical Hadoop application consists of different interleaving stages, each requiring different I/O workloads and patterns. As a result, the disk pairs' schedulers are not only sub-optimal for different MapReduce applications, but also sub-optimal for different sub-phases of the whole job. Accordingly, this paper presents a novel approach for adaptively tuning the disk pairs' schedulers in both the hypervisor and the virtual machines during the execution of a single MapReduce job. Our results show that MapReduce performance can be significantly improved; specifically, adaptive tuning of disk pairs' schedulers achieves a 25% performance improvement on a sort benchmark with Hadoop.
We present a robust web server called Phytree of a query gene for phylogenetic analysis from the latest protein database with a user-friendly, interactive graphical user interface. Phytree of a query gene combines a B...
详细信息
We present a robust web server called Phytree of a query gene for phylogenetic analysis from the latest protein database with a user-friendly, interactive graphical user interface. Phytree of a query gene combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done, the corresponding sequences can be passed to the phylogenetic tree reconstruction pipelines (multiple alignment, data process, phylogenetic reconstruction, tree visualization). As the major technological innovation, selection of a meaningful subset of BLAST hits is implemented using pipeline programming and tree visualization used Java-based recursive method that allows trees to be integrated and viewed seamlessly in standard web browsers with no extra software required. Phytree of a query gene only needs users to copy their amino acid sequence of their query gene to the input box, and the phylogenetic tree will be built once and for all. This function is very practical for non-specialist in phylogeny. Moreover, Phytree of a query gene introduces several new methods in treatment of intermediate results, which makes it much faster, easier and more accurate, robust in phylogenetic tree reconstruction. It is specially designed for users who have no experience in phylogeny. It is freely available on our web page: http://211.69.198.144/***.
Memory and I/O intensive applications always use a huge amount of memory and the performance decreases quickly when memory pressure arises. With the development of high performance network and widely used in cluster, ...
详细信息
CUDA has become a very popular programming paradigm in parallel computing area. However, very little work has been done for characterizing CUDA kernels. In this work, we measure the thread level performance, collect t...
详细信息
CUDA has become a very popular programming paradigm in parallel computing area. However, very little work has been done for characterizing CUDA kernels. In this work, we measure the thread level performance, collect the basic block level characteristics, and glean the instruction level properties for about 35 programs from CUDA SDK, Parboil, and Rodinia benchmark suites. In addition, we define basic block vectors, synchronization vectors and thread similarity matrix to capture the characteristics of CUDA programs efficiently. We find that CUDA programs have some unique characteristics at each level compared to sequential programs.
MapReduce programming model is emerging as an efficient tool for data-intensive applications. Hadoop, an open-source implementation of MapReduce, has been widely adopted and experienced by both academia and enterprise...
详细信息
ISBN:
(纸本)9781605589428
MapReduce programming model is emerging as an efficient tool for data-intensive applications. Hadoop, an open-source implementation of MapReduce, has been widely adopted and experienced by both academia and enterprise. Recently, lots of efforts have been done on improving the performance of MapReduce system and on analyzing the MapReduce process based on the log files generated during the Hadoop execution. Visualizing log files seems to be a very useful tool to understand the behavior of the Hadoop process. In this paper, we present MRScope, a real-time MapReduce tracing tool. MR-Scope provides a real-time insight of the MapReduce process, including the ongoing progress of every task hosted in Task Tracker. In addition, it displays the health of the Hadoop cluster data nodes, the distribution of the file system's blocks and their replicas and the content of the different block splits of the file system. We implement MR-Scope in native Hadoop 0.1. Experimental results demonstrat that MR-Scope's overhead is less than 4% when running wordcount benchmark. Copyright 2010 ACM.
As the degree of virtualization is growing considerably, improving performance of virtual machine environments motivates deeper investigation of the internal processes and performance implications of virtualization. S...
详细信息
暂无评论