检索结果-内蒙古大学图书馆

An effective parallelization algorithm for DEM generalization based on CUDA

ENVIRONMENTAL MODELLING & SOFTWARE 2019年第Apr.期114卷 64-74页

作者： Wu, Qianjiao Chen, Yumin Wilson, John P. Liu, Xuejun Li, Huifang Wuhan Univ Sch Resource & Environm Sci 129 Luoyu Rd Wuhan 430079 Hubei Peoples R China Univ Southern Calif Spatial Sci Inst Los Angeles CA 90089 USA Nanjing Normal Univ Minist Educ Key Lab Virtual Geog Environm 1 Wenyuan Rd Nanjing 210023 Jiangsu Peoples R China

An effective parallelization algorithm based on the compute-unified-device-architecture (CUDA) is developed for DEM generalization that is critical to multi-scale terrain analysis. It aims to efficiently retrieve the critical points for generating coarser-resolution DEMs which maximally maintain the significant terrain features. CUDA is embedded into a multi-point algorithm to provide a parallel-multi-point algorithm for enhancing its computing efficiency. The outcomes are compared with the ANUDEM, compound and maximum z-tolerance methods and the results demonstrate the proposed algorithm reduces response time by up to 96% compared to other methods. As to RMSE, it performs better than ANUDEM and needs half the number of points to keep the same RMSE. The mean slope and surface roughness are reduced by less than 1% in the tested cases. The parallel algorithm provides better streamline matching. Given its high computing efficiency, the proposed algorithm can retrieve more critical points to meet the demands of higher precision.

关键词： DEM Generalization parallelization algorithm CUDA

来源：评论

学校读者我要写书评

暂无评论

Enhanced optimization of single and multi-component mass exchanger networks using parallelization and adaptive relaxation

引用

Frontiers of Chemical Science and Engineering 2025年第2期19卷 35-53页

作者： Siqi Liu Zhiqiang Zhou Yuan Xiao Huanhuan Duan Guomin Cui School of Energy and Power Engineering University of Shanghai for Science and TechnologyShanghai 200093China

This paper proposes an innovative simultaneous optimization approach for single and multi-component mass exchanger network synthesis (MENS). A retrofitted stage-wise superstructure and a parallelized random walk algorithm with compulsive evolution (RWCE) are adopted. An iterative calculation method is designed to satisfy the requirements of multi-component mass transfer, with a relaxation for the outlet composition of the lean streams. The parametric analysis shows that the relaxation coefficient plays a major role in driving the convergence of the method. To improve the robustness of the established model, an adaptive relaxation coefficient strategy is implemented for multi-component MENS problems. In a divergence situation, the outlet concentration of the lean stream can be adjusted automatically by a random relaxation coefficient. Finally, three industrial MENS examples are considered in this work, whose total annual cost (TAC) are reduced by 7179, 2212, and 551 $·year^(-1). The corresponding optimization times are obtained to be 336, 125, and 145 s. The results indicate improvements in the economy and time, demonstrating that the parallelized RWCE can yield an optimal TAC and optimization efficiency compared to previous results. Overall, the adaptive relaxation coefficient strategy enhances the convergence for multi-component MENS problems.

关键词： mass exchanger network(MEN) stage-wise superstructure parallelization algorithm incompatible multi-component optimization efficiency adaptive relaxation

来源：评论

学校读者我要写书评

暂无评论

GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

引用

GENOME BIOLOGY 2023年第1期24卷 1页

作者： Zhang, Liubin Yuan, Yangyang Peng, Wenjie Tang, Bin Li, Mulin Jun Gui, Hongsheng Wang, Qiang Li, Miaoxin Sun Yat Sen Univ Zhongshan Sch Med Program Bioinformat Guangzhou 510080 Peoples R China Sun Yat Sen Univ Ctr Precis Med Guangzhou Peoples R China Sun Yat Sen Univ Ctr Dis Genome Res Guangzhou Peoples R China Zhejiang Chinese Med Univ Sch Med Technol & Informat Engn Hangzhou Peoples R China Tianjin Med Univ Prov & Minist Cosponsored Collaborat Innovat Ctr M Tianjin Peoples R China Henry Ford Hlth Behav Hlth Serv Detroit MI USA Henry Ford Hlth Ctr Hlth Policy & Hlth Serv Res Detroit MI USA Sichuan Univ West China Hosp Mental Hlth Ctr Chengdu Peoples R China Minist Educ Key Lab Trop Dis Control SYSU Guangzhou 510080 Peoples R China Sun Yat Sen Univ Guangdong Prov Key Lab Biomed Imaging Zhuhai Peoples R China Sun Yat Sen Univ Affiliated Hosp 5 Guangdong Prov Engn Res Ctr Mol Imaging Zhuhai Peoples R China

Whole-genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC's data structure and algorithms are valuable for accelerating large-scale genomic research.

关键词： Large-scale genotypes Genotype compression Highly addressable genotype blocks Byte-encoding genotypes Genotype management parallelization algorithm Cloud computation

来源：评论

学校读者我要写书评

暂无评论

Parallel algorithm of Hierarchical Phrase Machine Translation Based on Distributed Network Memory

引用

INTERNATIONAL JOURNAL OF INFORMATION SYSTEMS AND SUPPLY CHAIN MANAGEMENT 2022年第1期15卷 1-16页

作者： Qiu, Guanghua Henan Univ Kaifeng Peoples R China

Machine translation has developed rapidly. But there are some problems in machine translation, such as good reading, unable to reflect the mood and context, and even some languages machines cannot recognize. In order to improve the quality of translation, this paper uses the SSCI method to improve the quality of translation. It is found that the translation quality of hierarchical phrases is significantly improved after using the parallel algorithm of machine translation, which is about 9% higher than before, and the problem of context free grammar is also solved. The research also found that the use of parallel algorithm can effectively reduce the network memory occupation;the original 10-character content, after using the parallel algorithm, only need to occupy 8 characters, and the optimization reaches 20%. This means that the parallel algorithm of hierarchical phrase machine translation based on distributed network memory can play a very important role in machine translation.

关键词： Distributed Network Memory Hierarchical Phrase Machine Translation parallelization algorithm

来源：评论

学校读者我要写书评

暂无评论

Parallel VINS-Mono algorithm based on GPUs in embedded devices

引用

INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS 2022年第1期19卷

作者： Lu, Quan Xu, Jianli Hu, Likun Shi, Minghui Guangxi Univ Sch Elect Engn Nanning 530004 Peoples R China

Traditional visual-inertial simultaneous localization and mapping algorithms are usually designed based on CPUs, and they cannot effectively utilize the parallel computing function of GPUs if they are directly transplanted to an embedded board with a GPU module. However, the computing power of embedded devices is limited. It is unreasonable for the visual-inertial simultaneous localization and mapping algorithm to occupy most CPU computing resources while the GPU is idle. In this article, a parallelization scheme for the VINS-Mono algorithm based on GPU parallel computing technology is proposed. Based on the compute unified device architecture, the construction and solution of the incremental equation are parallelized in the nonlinear optimization process of the algorithm, and the parallelization methods provided by cuSOLVER and cuBLAS are used to carry out the marginalization of the algorithm. In addition, the program for the detection and matching of image feature points in the process of optical flow tracking is rewritten in the algorithm to realize the parallelization of optical flow tracking. After parallelization, the algorithm is found to run well on a heterogeneous computing model composed of a CPU and GPU and can fully exploit the parallel computing power of the GPU. The proposed method was tested on an NVIDIA's Jetson TX2 module and compared with the VINS-Mono algorithm;the speeds of the construction and solution of the incremental equation were found to be the same, but the optical flow tracking and marginalization speed of the proposed scheme exhibited improvements of about 1.5-1.7 times and 1.9 times, respectively.

关键词： VI-SLAM parallelization algorithm GPU CUDA factor graph

来源：评论

学校读者我要写书评

暂无评论

Parallel Multi-threaded Gridrec algorithm for Computer Tomography on GPU for Edge Computing 7

Parallel Multi-threaded Gridrec Algorithm for Computer Tomog...

引用

7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud) / 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom)

作者： Chen, Xintong Zhu, Yongxin Zheng, Xiaoying Miao, Si Nan, Tianhao Li, Wanyi Shanghai Jiao Tong Univ Sch Microelect Shanghai 200240 Peoples R China Chinese Acad Sci Shanghai Adv Res Inst Shanghai 201210 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China

ISBN: (纸本)9781728165509

Tomography reconstruction is the process of quickly reconstructing the original image form the projection obtained by X-ray radiation. At present, the high-resolution detector of the Shanghai Synchrotron Radiation Facility (SSRF) can scan more than 4GB of tomographic data every 1.5 seconds, and the transmission speed is increased to more than 100GB s(-1). With the upgrade of high-resolution detectors and the increase of data transmission volume, the reconstruction computation on cloud has become a bottleneck in improving the speed of tomography reconstruction even if the fastest Gridrec algorithm is adopted. In this paper, we propose an improved serial Gridrec algorithm and a parallel Gridrec algorithm by improving the convolution kernel to optimize the speed of existing image reconstruction algorithms on low cost GPUs for edge computing. On these GPUs, the multi-threaded tomography reconstruction algorithm not only guarantees high-quality results, but also improves the reconstruction speed over original Gridrec algorithm by more than 11x, and over the classic FBP algorithm by more than 234x. Besides the significant speedup, our work would be the first parallel implementation of Gridrec algorithm on GPU for edge computing.

关键词： tomographic reconstruction GPU parallelization algorithm multithreading

来源：评论

学校读者我要写书评

暂无评论

Parallel Multi-threaded Gridrec algorithm for Computer Tomography on GPU for Edge Computing

Parallel Multi-threaded Gridrec Algorithm for Computer Tomog...

引用

IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)

作者： XINTONG CHEN YONGXIN ZHU XIAOYING ZHENG SI MIAO TIANHAO NAN WANYI LI School of Microelectronics Shanghai Jiao Tong University Shanghai China Chinese Academy of Sciences Shanghai Advanced Research Institute Shanghai China

ISBN: (数字)9781728165509

ISBN: (纸本)9781728165516

Tomography reconstruction is the process of quickly reconstructing the original image form the projection obtained by X-ray radiation. At present, the high-resolution detector of the Shanghai Synchrotron Radiation Facility (SSRF) can scan more than 4GB of tomographic data every 1.5 seconds, and the transmission speed is increased to more than 100GB s -1 . With the upgrade of high-resolution detectors and the increase of data transmission volume, the reconstruction computation on cloud has become a bottleneck in improving the speed of tomography reconstruction even if the fastest Gridrec algorithm is adopted. In this paper, we propose an improved serial Gridrec algorithm and a parallel Gridrec algorithm by improving the convolution kernel to optimize the speed of existing image reconstruction algorithms on low cost GPUs for edge computing. On these GPUs, the multi-threaded tomography reconstruction algorithm not only guarantees high-quality results, but also improves the reconstruction speed over original Gridrec algorithm by more than 11x, and over the classic FBP algorithm by more than 234x. Besides the significant speedup, our work would be the first parallel implementation of Gridrec algorithm on GPU for edge computing.

关键词： tomographic reconstruction GPU parallelization algorithm multithreading Multithreading GRAPPER PICK UP Graphics Processing Unit X-Ray Computed Tomography algorithms transmission speed Reconstruction algorithm high resolution detector Tomography Reconstructive Surgical Procedures X-rays Parallel Lines

来源：评论

学校读者我要写书评

暂无评论

An Efficient Parallel Implicit Solver for LOD-FDTD algorithm in Cloud Computing Environment

引用

IEEE ANTENNAS AND WIRELESS PROPAGATION LETTERS 2018年第7期17卷 1209-1212页

作者： Tan, Jundong Li, Zihao Guo, Qi Long, Yunliang Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Guangdong Peoples R China

This letter presents an efficient parallel algorithm for solving locally one-dimensional (LOD) finite-difference time domain (FDTD) in cloud computing environment. As opposed to the existing LOD-FDTD algorithm parallelization scheme, the proposed method solves the implicit tridiagonal system in parallel by using the Sherman-Morrison formula to decompose the tridiagonal matrix into smaller matrices. The parallel nodes in cloud computers solve the matrices simultaneously. Numerical results show that the proposed method is more efficient in cloud computing environment than the conventional parallelization scheme and shows better scalability.

关键词： Locally one-dimensional finite-difference timedomain (LOD-FDTD) algorithm parallelization algorithm Sherman-Morrison formula

来源：评论

学校读者我要写书评

暂无评论

Distributed Data Mining for Root Causes of KPI Faults in Wireless Networks 1st

Distributed Data Mining for Root Causes of KPI Faults in Wir...

引用

1st Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

作者： Fan, Shiliang Yang, Yubin Lu, Wenyang Song, Ping Nanjing Univ State Key Lab Novel Software Technol Nanjing Jiangsu Peoples R China Huawei Technol Co Ltd Shanghai Peoples R China

ISBN: (纸本)9783319635644;9783319635637

In the field of wireless network optimization, with the enlargement of network size and the complication of network structure, traditional processing methods cannot effectively identify the causes of network faults in the face of increasing network data. In this paper, we propose a root-cause-analysis method based on distributed data mining (DRCA). Firstly, we put forward an improved decision tree, where the selection of the best split-feature is based on the feature's puritygain, and then we skillfully convert the problem of root-cause-analysis into modeling of an improved decision tree and interpretation of the tree model. In order to solve the problem of memory and efficiency associated with large-scale data, we parallelize the algorithm and distribute the tasks to multiple computers. The experiments show that DRCA is an effective, efficient, and scalable method.

关键词： KPI faults Root-cause-analysis Improved decision tree Distributed data mining parallelization algorithm

来源：评论

学校读者我要写书评

暂无评论

Software System for Maximal parallelization of algorithms on the Base of the Conception of Q-determinant 13th

引用

13th International Conference on Parallel Computing Technologies (PaCT)

作者： Aleeva, Valentina N. Sharabura, Ilya S. Suleymanov, Denis E. South Ural State Univ Chelyabinsk 454080 Russia

ISBN: (纸本)9783319219097;9783319219080

The development and the usage of parallel computing systems make it necessary to research parallelization resource of algorithms for search of the most rapid implementation. The algorithm representation as Q-determinant is one of the approaches that can be applied for that case. Such representation allows getting the most rapid possible implementation of the algorithm evaluates its performance complexity. Our work is to develop software system QStudio, which presents algorithm in the form of Q-determinant using the flowchart, finds the most rapid implementation of that one and builds an execution plan. The obtained results are oriented to ideal model of parallel computer system. However they can be a basis for automated execution of the most rapid algorithm implementations for real parallel computing systems.

关键词： algorithm representation as Q-determinant parallelization algorithm Most rapid implementation of the algorithm Execution plan of the most rapid implementation Parallel computing system

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：