检索结果-内蒙古大学图书馆

arXiv 2019年

作者： Поляков, А.В. Цымблер, М.Л. Polyakov, A.V. Zymbler, M.L. программирования магистрант Ленина 76 454080 Russia программирования Ленина 76 454080 Russia South Ural State University Department of System Programming prospekt Lenina 76 Chelyabinsk 454080 Russia South Ural State University Department of System Programming prospekt Lenina 76 Chelyabinsk 454080 Russia

Discord is a refinement of the concept of anomalous subsequence of a time series. The task of discords discovery is applied in a wide range of subject domains related to time series: medicine, economics, climate modeling, etc. In this paper, we propose a novel parallel algorithm for discords discovery for the Intel Xeon Phi Knights Landing (KNL) many-core systems for the case when input data fit in main memory. The algorithm exploits the ability to independently calculate Euclidean distances between the subsequences of the time series. Computations are paralleled through OpenMP technology. The algorithm consists of two stages, namely precomputations and discovery. At the precomputations stage, we construct the auxiliary matrix data structures, which ensure efficient vectorization of computations on Intel Xeon Phi KNL. At the discovery stage, the algorithm finds discord based upon the structures above. Experimental evaluation confirms the high scalability of the developed algorithm. Copyright © 2019, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Fast and Efficient parallel Algorithm for Pruned Landmark Labeling

A Fast and Efficient Parallel Algorithm for Pruned Landmark ...

引用

IEEE High Performance Extreme Computing Conference (HPEC)

作者： Dong, Qing Lakhotia, Kartik Zeng, Hanqing Kannan, Rajgopal Prasanna, Viktor Seetharaman, Guna Univ Southern Calif Dept Comp Sci Los Angeles CA 90089 USA Univ Southern Calif Ming Hsieh Dept Elect Engn Los Angeles CA 90089 USA US Army Res Lab West Los Angeles CA USA US Naval Res Lab Washington DC USA

ISBN: (纸本)9781538659892

Hub labeling based shortest distance querying plays a key role in many important networked graph applications, such as route planning, socially-sensitive search and web page ranking. Over the last few years, Pruned Landmark Labeling (PLL) has emerged as the state-of-the-art technique for hub labeling. PLL drastically reduces the complexity of label construction by pruning Shortest-Path Trees (SPTs). However, PLL is inherently sequential, as different SPTs must be constructed in a specific order of source vertices to ensure small label size. Particularly, for large graphs, it takes significant processing time to construct even pruned SPTs from all vertices in the graph. While there are many works on parallelizing single source shortest path, these solutions cannot be directly used for PLL, as pruning and label querying introduce significant additional complexity while restricting parallelism within an SPT. In this paper, we propose a novel, fast and efficient algorithm to significantly accelerate PLL on large graphs based on a two-level parallelization of SPTs: intra-and inter-tree. For intra-tree, we generate pruned SPTs based on a modification of the Bellman-Ford (BF) algorithm. We further optimize BF to reduce SPT label querying and initialization costs. We implement our algorithm using the recently proposed Graph Processing Over Partitions (GPOP) which dramatically improves cache-efficiency and DRAM communication-bandwidth. When pruned SPTs become very small and parallelizing individual SPTs is not advantageous, we switch to inter-tree parallelization and construct multiple trees concurrently in a batch. Experiments conducted on a 36 core (2-way hyperthreaded) Intel Broadwell server show that on some datasets, our proposed parallel algorithm can achieve greater than 35.1x speedup over state-of-the-art sequential algorithm.

关键词： Labeling Phase locked loops Acceleration Partitioning algorithms parallel algorithms Optimization

来源：评论

学校读者我要写书评

暂无评论

High-parallel Hyperspectral Image Detection Algorithm by Sherman-Morrison Calculation of Dual-Windows

High-parallel Hyperspectral Image Detection Algorithm by She...

引用

Signal, Information and Data Processing (ICSIDP), IEEE International Conference on

作者： Yuan Li Lu Li Wei Li College of Information Science & Technology Beijing University of Chemical Technology Beijing China School of Information and Electronics Beijing Institute of Technology Beijing China

ISBN: (数字)9781728123455

ISBN: (纸本)9781728123462

Hyperspectral image (HSI) object detection have received increasing attention. However, while obtaining rich information through hyperspectral imaging, it brings new challenges to the real-time processing of high-accuracy detection. In this paper, a near real-time parallel algorithm based on sliding dual-windows is proposed, which can be used for object detection in hyperspectral image. First, the Sherman-form is employed to complete the transformation between the sliding dual-windows, so that the process of target or anomaly detection is iteratively calculated. Then, the detection algorithm parallel implement by using GPU to further increase the processing speed. The experimental results demonstrated that the proposed method was more effective than the compared method.

关键词： Hyperspectral imaging Target and anomaly detection Near Real-time processing hyperspectral imagery Object detection Detection algorithms real-time process anomaly detection parallel algorithms accuracy test

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithm for Bayesian Network Inference using Arithmetic Circuits 32

A Parallel Algorithm for Bayesian Network Inference using Ar...

引用

27th International Heterogeneity in Computing Workshop in conjunction with 32nd IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Vasimuddin, Md. Chockalingam, Sriram P. Aluru, Srinivas Indian Inst Technol Mumbai Maharashtra India Georgia Inst Technol Atlanta GA 30332 USA

ISBN: (纸本)9781538643686

Exact inference in Bayesian networks is NP-Hard. While many parallel algorithms have been proposed for this irregular problem, none have been shown to scale to even hundreds of processors. In this paper, we present a scalable distributed-memory parallel algorithm for exact inference based on Darwiche's approach, which poses inference as upward and downward accumulation of values computed at the nodes of an arithmetic circuit, a rooted directed acyclic graph. Our work includes parallel algorithms for both construction of the arithmetic circuit as well as inference using the circuit. We demonstrate the scalability of our algorithms for up to 1,536 cores on synthetic as well as real datasets, whose corresponding arithmetic circuits contain up to billions of nodes. The runtime for inference is only a small fraction of the runtime for circuit construction, providing the ability to quickly perform multiple inferences once the circuit is constructed.

关键词： Junctions Bayes methods Inference algorithms parallel algorithms Random variables Program processors Scalability"

来源：评论

学校读者我要写书评

暂无评论

An Optimal parallel Algorithm for Computing the Summed Area Table on the GPU 32

An Optimal Parallel Algorithm for Computing the Summed Area ...

引用

32nd IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Emoto, Yutaro Funasaka, Shunji Tokura, Hiroki Honda, Takumi Nakano, Koji Ito, Yasuaki Hiroshima Univ Dept Informat Engn Kagamiyama 1-4-1 Higashihiroshima 7398527 Japan

ISBN: (纸本)9781538655559

The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision, which can be obtained by computing the column-wise prefix-sums and then the row-wise prefix-sums. The main contribution of this paper is to present a very efficient parallel algorithm for computing the SAT of a matrix stored in the global memory of the GPU. Our new parallel algorithm uses two techniques, single kernel soft synchronization and look back techniques to compute the SAT efficiently. It performs approximately one read and one write operations per element to the global memory. Since all elements in the matrix must be read once, and those in the resulting SAT must be written, any SAT computation cannot be faster than duplication of the matrix in the global memory. Thus, our algorithm is theoretically optimal in terms of global memory access. We have implemented our parallel algorithm and previously published algorithms for computing the SAT to run on NVIDIA TITAN V GPU. Our parallel SAT algorithm runs faster than all previous algorithms for matrices of sizes from 256 x 256 to 32K x 32K. Also, the overhead ratio over matrix duplication can be only 5.7%, so it is also practically optimal.

关键词： Graphics processing units Instruction sets Kernel parallel algorithms Computer architecture"

来源：评论

学校读者我要写书评

暂无评论

A Comparative study of parallel CPU/GPU implementations of the K-Means Algorithm

A Comparative study of parallel CPU/GPU implementations of t...

引用

International Conference on Advanced Electrical Engineering (ICAEE)

作者： Sara Daoudi Chakib Mustapha Anouar Zouaoui Miloud Chikr El-Mezouar Nasreddine Taleb RCAM Laboratory Djillali Liabès University Sidi Bel Abbes Algeria

ISBN: (数字)9781728122205

ISBN: (纸本)9781728122212

The K-Means algorithm is one of the most sophisticated and known algorithms for data-clustering. In this study, we will show the K-Means algorithm as it relates to OpenCL, which is a widespread parallel ecosystem that is reliable for processing and mining datasets that are large in scale. Additionally, we propose a comparative study of the three most efficient K-means algorithm implementations: The Lloyd-Forgy's sequential Method Implementation, a parallel implementation targeting the CPU using OpenMP and finally one of the most complex implementations that uses an OpenCL language. Typically, the measure of performance is done using different data sizes. For large datasets under OpenCL, when comparing the GPU-based parallel algorithm to the CPU-based serial algorithm, the results have shown a good acceleration effect. On the other hand, for small data sets, the OpenMP implementation has turned out to be the best choice.

关键词： Clustering algorithms Graphics processing units Classification algorithms Acceleration Data mining Convergence parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Research on Fast and parallel Clustering Method for Trajectory data 24

Research on Fast and Parallel Clustering Method for Trajecto...

引用

24th IEEE International Conference on parallel and Distributed Systems (ICPADS)

作者： Wang, Ne Gao, Shu Peng, Xiangwen Wang, Minrui Wuhan Univ Technol Wuhan Hubei Peoples R China

ISBN: (纸本)9781538673089

In the era of big data, the development of satellite technology and Internet of Things has produced a large amount of trajectory data. We can effectively understand and predict the movement of the objects by analyzing their trajectory data. Now, most of density-based clustering algorithms have some disadvantages including the difficulty to determine input parameters, large 110, and so on. DPC (Clustering by fast search and find of Density Peaks) is a new density-based clustering algorithm, which is simple and has only one input parameter, and also it is not affected by the data dimension, therefore, it can be effectively applied for trajectory clustering. However, in DPC, the local density is complex to calculate, and the cutoff distance is subjective to determine. In addition, DPC does not consider the existence of multiple cluster centers in the same cluster when clustering. To solve these problems, in this paper a fast clustering algorithm for trajectory data is put forward. In addition, Spark memory computing technology and data partitioning method are used to parallelize the algorithm, which greatly improves the clustering efficiency. Finally, experiments with three months' ship trajectory data from the Yangtze River have demonstrated that the clustering efficiency and effectiveness of our algorithm are significantly improved.

关键词： trajectory data density peak clustering algorithm parallel algorithms Spark

来源：评论

学校读者我要写书评

暂无评论

parallel algorithm for prediction of variables in Simultaneous Equation Models

Parallel algorithm for prediction of variables in Simultaneo...

引用

International Conference on High Performance Computing & Simulation (HPCS)

作者： Óscar Gómez Jose J. López-Espín Antonio Peñalver Center of Operations Research Miguel Hernández University Elche Spain

ISBN: (数字)9781728144849

ISBN: (纸本)9781728144856

Simultaneous equation models (SEM) are multivariate techniques that reflect the presence of jointly endogenous variables. Traditionally, these models have been used in economy, expanding in last decades into other disciplines. One of usefulness of the SEM is the future estimation of the endogenous variables once the coefficient of the model has been obtained. This estimation is made using the actual information of endogenous and exogenous variables, as well as the matrices of the model. This work studies a parallel algorithm for the future prediction of the endogenous variables of an SEM model. Experimental tests comparing shared memory and message passing algorithms are made when varying the problem size, in order to check the behaviour of the algorithm and the ideal resources to use.

关键词： Mathematical model Biological system modeling Numerical analysis Estimation Program processors parallel algorithms Predictive models

来源：评论

学校读者我要写书评

暂无评论

Nesterov-based Alternating Optimization for Nonnegative Tensor Completion: Algorithm and parallel Implementation 19

Nesterov-based Alternating Optimization for Nonnegative Tens...

引用

IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

作者： Lourakis, Georgios Liavas, Athanasios P. Tech Univ Crete Sch Elect & Comp Engn Khania Greece

ISBN: (纸本)9781538635124

We consider the problem of nonnegative tensor completion. Our aim is to derive an efficient algorithm that is also suitable for parallel implementation. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a Nesterov-type algorithm for smooth convex problems. We describe a parallel implementation of the algorithm and measure the attained speedup in a multi-core computing environment. It turns out that the derived algorithm is an efficient candidate for the solution of very large-scale sparse nonnegative tensor completion problems.

关键词： tensors nonnegative tensor completion optimal first-order optimization algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel optimization of fiber bundle segmentation for massive tractography datasets

arXiv

引用

arXiv 2019年

作者： Vázquez, Andrea López-López, Narciso Labra, Nicole Figueroa, Miguel Poupon, Cyril Mangin, Jean-François Hernández, Cecilia Guevara, Pamela Faculty of Engineering Universidad de Concepción Concepción Chile Dept.of Computer Science Universidade da Coruña A Coruña Spain I2BM Neurospin CEA Gif-sur-Yvette France

We present an optimized algorithm that performs automatic classification of white matter fibers based on a multi-subject bundle atlas. We implemented a parallel algorithm that improves upon its previous version in both execution time and memory usage. Our new version uses the local memory of each processor, which leads to a reduction in execution time. Hence, it allows the analysis of bigger subject and/or atlas datasets. As a result, the segmentation of a subject of 4,145,000 fibers is reduced from about 14 minutes in the previous version to about 6 minutes, yielding an acceleration of 2.34. In addition, the new algorithm reduces the memory consumption of the previous version by a factor of 0.79. Copyright © 2019, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：