Scalable graph embedding on large networks is challenging because of the complexity of graph structures and limited computing resources. Recent research shows that the multi-level framework can enhance the scalability...
详细信息
ISBN:
(纸本)9781665410175
Scalable graph embedding on large networks is challenging because of the complexity of graph structures and limited computing resources. Recent research shows that the multi-level framework can enhance the scalability of graph embedding methods with little loss of quality. In general, methods using this framework first coarsen the original graph into a series of smaller graphs then learn the representations of the original graph from them in an efficient manner. However, to the best of our knowledge, most multi-level based methods do not have a parallel implementation. Meanwhile, the emergence of high-performance computing for machine learning provides an opportunity to boost graph embedding by distributedcomputing. In this paper, we propose a distributed MultI -Level Embedding (DistMILE 1 1 Our code is available at https://***/heyuntian/DistMILE) framework to further improve the scalability of graph embedding. DistMILE leverages a novel shared-memory parallel algorithm for graph coarsening and a distributed training paradigm for embedding refinement. With the advantage of high-performance computing techniques, Dist-MILE can smoothly scale different base embedding methods over large networks. Our experiments demonstrate that DistMILE learns representations of similar quality with respect to other baselines, while reduces the time of learning embeddings on large-scale networks to hours. Results show that DistMILE can achieve up to 28 x speedup compared with a popular multi-level embedding framework MILE and expedite existing embedding methods with 40 x speedup.
Stencil computations are at the core of various Computational Fluid Dynamics (CFD) applications and have been well-studied for several decades. Typically they're highly memory-hound and as a result, numerous tilin...
详细信息
ISBN:
(纸本)9781728199986
Stencil computations are at the core of various Computational Fluid Dynamics (CFD) applications and have been well-studied for several decades. Typically they're highly memory-hound and as a result, numerous tiling algorithms have been proposed to improve its performance. Although efficient, most of these algorithms are designed for single iteration spaces on shared-memory machines. However, in CFD, we are confronted with multi-block structured girds composed of multiple connected iteration spaces distributed across many nodes. In this paper, we propose a pipelined stencil algorithm called Pencil for distributed memory machines that applies to practical CFD problems that span multiple iteration spaces. Based on an in-depth analysis of cache tiling on a single node, we first identify both the optimal combination of MPI and OpenMP for temporal tiling and the best tiling approach, which outperforms the stateof-the-art automatic parallelization tool Pluto by up to 1.92x. Then, we adopt DeepHalo to decouple the multiple connected iteration spaces so that temporal tiling can be applied to each space. Finally, we achieve overlap by pipelining the computation and communication without sacrificing the advantage from temporal cache tiling. Pencil is evaluated using 4 stencils across 6 numerical schemes on two distributed memory machines with Omni-Path and InfiniBand networks. On the Omni-Path system, Pencil exhibits outstanding weak and strong scalability for up to 128 nodes and outperforms MPI+OpenMP Funneled with space tiling by 1.33-3.41x on a multi-block grid with 32 nodes.
Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, wherea...
Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, whereas some anomalies can be only identified through long temporal contextual information. This may finally lead to disastrous outcomes due to false negatives of these anomalies. Prior arts employ Transformers (i.e., a neural network architecture that has powerful capability in modeling long-term dependence and global association) to alleviate this problem; however, Transformers are insensitive in sensing local context, which may neglect subtle anomalies. Therefore, in this paper, we propose a local-adaptive Transformer based on cross-correlation for time series anomaly detection, which unifies both global and local information to capture comprehensive time series patterns. Specifically, we devise a cross-correlation mechanism by employing causal convolution to adaptively capture local pattern variation, offering diverse local information into the long-term temporal learning process. Furthermore, a novel optimization objective is utilized to jointly optimize reconstruction of the entire time series and matrix derived from cross-correlation mechanism, which prevents the cross-correlation from becoming trivial in the training phase. The generated cross-correlation matrix reveals underlying interactions between dimensions of multivariate time series, which provides valuable insights into anomaly diagnosis. Extensive experiments on six real-world datasets demonstrate that our model outperforms state-of-the-art competing methods and achieves 6.8%-27.5% $F_{1}$ score improvement. Our method also has good anomaly interpretability and is effective for anomaly diagnosis.
Power cable, as an important power equipment widely used in urban power transmission and distribution, its insulation state influences the reliability of urban power supply. Partial discharge (PD) detection is an effe...
详细信息
Graph streaming has received substantial attention for the past 10+ years to cope with large-scale graph computation. Two major approaches, one using conventional data-streaming tools and the other accessing graph dat...
详细信息
ISBN:
(纸本)9781665480468
Graph streaming has received substantial attention for the past 10+ years to cope with large-scale graph computation. Two major approaches, one using conventional data-streaming tools and the other accessing graph databases, facilitate continuous analysis of endlessly flowing graphs and query-based incremental construction of huge graphs, respectively. However, some scientific graphs including biological networks need to stay in memory for repetitive but various analyses. Although a cluster system, thus distributed memory can entirely handle a big graph in memory, a challenge is substantial overhead incurred by loading graphs into memory. A solution is hiding such graph-loading and construction overheads with graph computation in a pipelined fashion. We adapted this pipelining approach for agent-based graph computing where thousands of agents traverse a graph for finding its attributes and shape. We used the multi-agent spatial simulation (MASS) library to implement the concept. A huge graph is incrementally constructed in batches, each spawning and walking agents over the corresponding subgraph, and thus all eventually completing a given computation. We coded and ran two MASS benchmark programs: triangle counting and connected components, with which we evaluated our pipelined graph processing. The best performance was obtained once the batch size shrunk enough to fit cache memory, regardless of the number of cluster nodes. For a single node execution of connected components over a 140MB graph, our graph-pipelining implementation performed 7.7 times faster than non-pipelining execution. Its parallel execution with 24 cluster nodes achieved 8.3 times speed-up as compared to the pipelined single-node execution.
The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applica...
详细信息
ISBN:
(纸本)9781665422925
The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, there is a need for improvements in storage and file system designs to meet the ever-growing I/O needs of HPC applications. Storage and file system designers require a deep understanding of how HPC application I/O behavior affects current storage system installations in order to improve them. In this work, we contribute to this understanding using application-agnostic file system statistics gathered on compute nodes as well as metadata and object storage file system servers. We analyze file system statistics of more than 4 million jobs over a period of three years on two systems at Lawrence Livermore National Laboratory that include a 15 PiB Lustre file system for storage. The results of our study add to the state-of-the-art in I/O understanding by providing insight into how general HPC workloads affect the performance of large-scale storage systems. Some key observations in our study show that reads and writes are evenly distributed across the storage system;applications which perform I/O, spread that I/O across similar to 78% of the minutes of their runtime on average;less than 22% of HPC users who submit write-intensive jobs perform efficient writes to the file system;and I/O contention seriously impacts I/O performance.
In this paper, we propose duality-based locality-aware stream partitioning (LSP) in distributed stream processing engines (DSPEs). In general, LSP directly uses the locality concept of distributed batch processing eng...
详细信息
ISBN:
(纸本)9783030483401;9783030483395
In this paper, we propose duality-based locality-aware stream partitioning (LSP) in distributed stream processing engines (DSPEs). In general, LSP directly uses the locality concept of distributed batch processing engines (DBPEs). This concept does not fully take into account the characteristics of DSPEs and therefore does not maximize cluster resource utilization. To solve this problem, we first explain the limitations of existing LSP, and we then propose a duality relationship between DBPEs and DSPEs. We finally propose a simple but efficient ping-based mechanism to maximize the locality of DSPEs based on the duality. The insights uncovered in this paper can maximize the throughput and minimize the latency in stream partitioning.
The proceedings contain 16 papers. The special focus in this conference is on Signal and Image Processing. The topics include: Deep Convolutional Neural Network-Based Diagnosis of Invasive Ductal Carcinoma;speaker Ide...
ISBN:
(纸本)9789813369658
The proceedings contain 16 papers. The special focus in this conference is on Signal and Image Processing. The topics include: Deep Convolutional Neural Network-Based Diagnosis of Invasive Ductal Carcinoma;speaker Identification in Spoken Language Mismatch Condition: An Experimental Study;Ultrasound Image Classification Using ACGAN with Small Training Dataset;preface;Chaotic Ions Motion Optimization (CIMO) for Biological Sequences Local Alignment: COVID-19 as a Case Study;assessment of Eyeball Movement and Head Movement Detection Based on Reading;using Hadoop Ecosystem and Python to Explore Climate Change;a Brief Review of Intelligent Rule Extraction Techniques;the Effect of Different Feature Selection Methods for Classification of Melanoma;intelligent Hybrid Technique to Secure Bluetooth Communications;parallel Algorithm to find Integer k where a given Well-distributed Graph is k-Metric Dimensional;a Fog-Based Retrieval of Real-Time Data for Health Applications;differential Evolution-Based Shot Boundary Detection Algorithm for Content-Based Video Retrieval;qutrit-Based Genetic Algorithm for Hyperspectral Image Thresholding.
This paper presents the main features and the programming constructs of the DCEx programming model designed for the implementation of data-centric large-scale parallel applications on Exascale computing platforms. To ...
详细信息
ISBN:
(纸本)9783030483401;9783030483395
This paper presents the main features and the programming constructs of the DCEx programming model designed for the implementation of data-centric large-scale parallel applications on Exascale computing platforms. To support scalable parallelism, the DCEx programming model employs private data structures and limits the amount of shared data among parallel threads. The basic idea of DCEx is structuring programs into data-parallel blocks to be managed by a large number of parallel threads. parallel blocks are the units of shared- and distributed-memory parallel computation, communication, and migration in the memory/storage hierarchy. Threads execute close to data using near-data synchronization according to the PGAS model. A use case is also discussed showing the DCEx features for Exascale programming.
The emergence of large-scale dynamic sets in real applications brings severe challenges in approximate set representation structures. A dynamic set with changing cardinality requires an elastic capacity of the approxi...
详细信息
暂无评论