In this research, we apply the Green's theory for converting the partial differential equation to the boundary integral equation for geometric transformation. Green's theory is designed specifically for integr...
详细信息
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are ...
详细信息
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.
Fault resilience has became a major issue for HPC systems, particularly, in the perspective of future E-scale systems, which will consist of millions of CPU cores and other components. MPI-level fault tolerant constru...
详细信息
The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpecte...
详细信息
The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance.
Dangling pointer error is pervasive in C/C++ programs and it is very hard to detect. This paper introduces an efficient detector to detect dangling pointer error in C/C++ programs. By selectively leave some memory acc...
Dangling pointer error is pervasive in C/C++ programs and it is very hard to detect. This paper introduces an efficient detector to detect dangling pointer error in C/C++ programs. By selectively leave some memory accesses unmonitored, our method could reduce the memory monitoring overhead and thus achieves better performance over previous methods. Experiments show that our method could achieve an average speed up of 9% over previous compiler instrumentation based method and more than 50% over previous page protection based method.
Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *...
详细信息
ISBN:
(纸本)9781510835368
Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *** creatively propose CS-unit,which is equivalent to a combination of a convolutional layer and a pooling layer but more *** computing complexity of and some other similar operation is demonstrated,revealing an advantage on computation of ***,practical experiments are also performed and the result shows that CS-unit holds a real superiority on run time.
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features e...
详细信息
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.
In content-centric networking, the schemes of innetwork caching can affect the performance of the whole network. Existing schemes lack of the global view, which results in inefficient caches. In this paper, we aim to ...
详细信息
ISBN:
(纸本)9781467399920
In content-centric networking, the schemes of innetwork caching can affect the performance of the whole network. Existing schemes lack of the global view, which results in inefficient caches. In this paper, we aim to analyze the real-time distribution of contents among caches from multiple perspectives. This paper proposes TCBRP, a scheme that analyzes caching tendency of various contents in reverse path, based on centrality of nodes, popularity of contents and replacement rate of nodes, to cache in-network contents. This scheme also has decent scalability and can be expended conveniently. The experimental results reflect that TCBRP report savings in average hops and balance cache hit rate, compared with BetwRep and LCE.
Copy-on-write virtual disks (e.g., qcow2 images) provide many useful features like snapshot, de-duplication, and full-disk encryption. However, our study uncovers that they introduce additional metadata for block orga...
ISBN:
(纸本)9781931971287
Copy-on-write virtual disks (e.g., qcow2 images) provide many useful features like snapshot, de-duplication, and full-disk encryption. However, our study uncovers that they introduce additional metadata for block organization and notably more disk sync operations (e.g., more than 3X for qcow2 and 4X for VMDK images). To mitigate such sync amplification, we propose three optimizations, namely per virtual disk internal journaling, dual-mode journaling, and adaptive-preallocation, which eliminate the extra sync operations while preserving those features in a consistent way. Our evaluation shows that the three optimizations result in up to 110% performance speedup for varmail and 50% for TPCC.
Developing applications for modern complex networked robotic systems is more challenging due to the introduction of possibly sophisticated communication and coordination aspects. In this paper, we propose EmSBoT, a li...
详细信息
ISBN:
(纸本)9781467385244
Developing applications for modern complex networked robotic systems is more challenging due to the introduction of possibly sophisticated communication and coordination aspects. In this paper, we propose EmSBoT, a lightweight embedded component-based software framework targeting resource-constrained networked robotic systems. EmSBoT provides a unified Application Program Interface (API) that hides the heterogeneous distributed environment from applications. Its OS abstraction layer endows it with OS independence and portability. A port-based communication mechanism is adopted to exchange message between loosely coupled components, making the system with fault-tolerance capability. By isolating the communication channels as separate agents, the framework provides uniform and transparent message-passing for agents over node boundaries. We describe the architecture, programming model and core features of EmSBoT in this paper, together with the performance evaluation and behavior validation to demonstrate its efficiency and feasibility.
暂无评论