Discord is a refinement of the concept of anomalous subsequence of a time series. The task of discords discovery is applied in a wide range of subject domains related to time series: medicine, economics, climate model...
详细信息
Hub labeling based shortest distance querying plays a key role in many important networked graph applications, such as route planning, socially-sensitive search and web page ranking. Over the last few years, Pruned La...
详细信息
ISBN:
(纸本)9781538659892
Hub labeling based shortest distance querying plays a key role in many important networked graph applications, such as route planning, socially-sensitive search and web page ranking. Over the last few years, Pruned Landmark Labeling (PLL) has emerged as the state-of-the-art technique for hub labeling. PLL drastically reduces the complexity of label construction by pruning Shortest-Path Trees (SPTs). However, PLL is inherently sequential, as different SPTs must be constructed in a specific order of source vertices to ensure small label size. Particularly, for large graphs, it takes significant processing time to construct even pruned SPTs from all vertices in the graph. While there are many works on parallelizing single source shortest path, these solutions cannot be directly used for PLL, as pruning and label querying introduce significant additional complexity while restricting parallelism within an SPT. In this paper, we propose a novel, fast and efficient algorithm to significantly accelerate PLL on large graphs based on a two-level parallelization of SPTs: intra-and inter-tree. For intra-tree, we generate pruned SPTs based on a modification of the Bellman-Ford (BF) algorithm. We further optimize BF to reduce SPT label querying and initialization costs. We implement our algorithm using the recently proposed Graph Processing Over Partitions (GPOP) which dramatically improves cache-efficiency and DRAM communication-bandwidth. When pruned SPTs become very small and parallelizing individual SPTs is not advantageous, we switch to inter-tree parallelization and construct multiple trees concurrently in a batch. Experiments conducted on a 36 core (2-way hyperthreaded) Intel Broadwell server show that on some datasets, our proposed parallel algorithm can achieve greater than 35.1x speedup over state-of-the-art sequential algorithm.
Hyperspectral image (HSI) object detection have received increasing attention. However, while obtaining rich information through hyperspectral imaging, it brings new challenges to the real-time processing of high-accu...
详细信息
ISBN:
(数字)9781728123455
ISBN:
(纸本)9781728123462
Hyperspectral image (HSI) object detection have received increasing attention. However, while obtaining rich information through hyperspectral imaging, it brings new challenges to the real-time processing of high-accuracy detection. In this paper, a near real-time parallel algorithm based on sliding dual-windows is proposed, which can be used for object detection in hyperspectral image. First, the Sherman-form is employed to complete the transformation between the sliding dual-windows, so that the process of target or anomaly detection is iteratively calculated. Then, the detection algorithm parallel implement by using GPU to further increase the processing speed. The experimental results demonstrated that the proposed method was more effective than the compared method.
Exact inference in Bayesian networks is NP-Hard. While many parallel algorithms have been proposed for this irregular problem, none have been shown to scale to even hundreds of processors. In this paper, we present a ...
详细信息
ISBN:
(纸本)9781538643686
Exact inference in Bayesian networks is NP-Hard. While many parallel algorithms have been proposed for this irregular problem, none have been shown to scale to even hundreds of processors. In this paper, we present a scalable distributed-memory parallel algorithm for exact inference based on Darwiche's approach, which poses inference as upward and downward accumulation of values computed at the nodes of an arithmetic circuit, a rooted directed acyclic graph. Our work includes parallel algorithms for both construction of the arithmetic circuit as well as inference using the circuit. We demonstrate the scalability of our algorithms for up to 1,536 cores on synthetic as well as real datasets, whose corresponding arithmetic circuits contain up to billions of nodes. The runtime for inference is only a small fraction of the runtime for circuit construction, providing the ability to quickly perform multiple inferences once the circuit is constructed.
The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision, which can be obtained by computing the column-wise prefix-sums and then the row-wise prefix-sums. The main co...
详细信息
ISBN:
(纸本)9781538655559
The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision, which can be obtained by computing the column-wise prefix-sums and then the row-wise prefix-sums. The main contribution of this paper is to present a very efficient parallel algorithm for computing the SAT of a matrix stored in the global memory of the GPU. Our new parallel algorithm uses two techniques, single kernel soft synchronization and look back techniques to compute the SAT efficiently. It performs approximately one read and one write operations per element to the global memory. Since all elements in the matrix must be read once, and those in the resulting SAT must be written, any SAT computation cannot be faster than duplication of the matrix in the global memory. Thus, our algorithm is theoretically optimal in terms of global memory access. We have implemented our parallel algorithm and previously published algorithms for computing the SAT to run on NVIDIA TITAN V GPU. Our parallel SAT algorithm runs faster than all previous algorithms for matrices of sizes from 256 x 256 to 32K x 32K. Also, the overhead ratio over matrix duplication can be only 5.7%, so it is also practically optimal.
The K-Means algorithm is one of the most sophisticated and known algorithms for data-clustering. In this study, we will show the K-Means algorithm as it relates to OpenCL, which is a widespread parallel ecosystem that...
详细信息
ISBN:
(数字)9781728122205
ISBN:
(纸本)9781728122212
The K-Means algorithm is one of the most sophisticated and known algorithms for data-clustering. In this study, we will show the K-Means algorithm as it relates to OpenCL, which is a widespread parallel ecosystem that is reliable for processing and mining datasets that are large in scale. Additionally, we propose a comparative study of the three most efficient K-means algorithm implementations: The Lloyd-Forgy's sequential Method Implementation, a parallel implementation targeting the CPU using OpenMP and finally one of the most complex implementations that uses an OpenCL language. Typically, the measure of performance is done using different data sizes. For large datasets under OpenCL, when comparing the GPU-based parallel algorithm to the CPU-based serial algorithm, the results have shown a good acceleration effect. On the other hand, for small data sets, the OpenMP implementation has turned out to be the best choice.
In the era of big data, the development of satellite technology and Internet of Things has produced a large amount of trajectory data. We can effectively understand and predict the movement of the objects by analyzing...
详细信息
ISBN:
(纸本)9781538673089
In the era of big data, the development of satellite technology and Internet of Things has produced a large amount of trajectory data. We can effectively understand and predict the movement of the objects by analyzing their trajectory data. Now, most of density-based clustering algorithms have some disadvantages including the difficulty to determine input parameters, large 110, and so on. DPC (Clustering by fast search and find of Density Peaks) is a new density-based clustering algorithm, which is simple and has only one input parameter, and also it is not affected by the data dimension, therefore, it can be effectively applied for trajectory clustering. However, in DPC, the local density is complex to calculate, and the cutoff distance is subjective to determine. In addition, DPC does not consider the existence of multiple cluster centers in the same cluster when clustering. To solve these problems, in this paper a fast clustering algorithm for trajectory data is put forward. In addition, Spark memory computing technology and data partitioning method are used to parallelize the algorithm, which greatly improves the clustering efficiency. Finally, experiments with three months' ship trajectory data from the Yangtze River have demonstrated that the clustering efficiency and effectiveness of our algorithm are significantly improved.
Simultaneous equation models (SEM) are multivariate techniques that reflect the presence of jointly endogenous variables. Traditionally, these models have been used in economy, expanding in last decades into other dis...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
Simultaneous equation models (SEM) are multivariate techniques that reflect the presence of jointly endogenous variables. Traditionally, these models have been used in economy, expanding in last decades into other disciplines. One of usefulness of the SEM is the future estimation of the endogenous variables once the coefficient of the model has been obtained. This estimation is made using the actual information of endogenous and exogenous variables, as well as the matrices of the model. This work studies a parallel algorithm for the future prediction of the endogenous variables of an SEM model. Experimental tests comparing shared memory and message passing algorithms are made when varying the problem size, in order to check the behaviour of the algorithm and the ideal resources to use.
We consider the problem of nonnegative tensor completion. Our aim is to derive an efficient algorithm that is also suitable for parallel implementation. We adopt the alternating optimization framework and solve each n...
详细信息
ISBN:
(纸本)9781538635124
We consider the problem of nonnegative tensor completion. Our aim is to derive an efficient algorithm that is also suitable for parallel implementation. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a Nesterov-type algorithm for smooth convex problems. We describe a parallel implementation of the algorithm and measure the attained speedup in a multi-core computing environment. It turns out that the derived algorithm is an efficient candidate for the solution of very large-scale sparse nonnegative tensor completion problems.
We present an optimized algorithm that performs automatic classification of white matter fibers based on a multi-subject bundle atlas. We implemented a parallel algorithm that improves upon its previous version in bot...
详细信息
暂无评论