An effective parallelization algorithm based on the compute-unified-device-architecture (CUDA) is developed for DEM generalization that is critical to multi-scale terrain analysis. It aims to efficiently retrieve the ...
详细信息
An effective parallelization algorithm based on the compute-unified-device-architecture (CUDA) is developed for DEM generalization that is critical to multi-scale terrain analysis. It aims to efficiently retrieve the critical points for generating coarser-resolution DEMs which maximally maintain the significant terrain features. CUDA is embedded into a multi-point algorithm to provide a parallel-multi-point algorithm for enhancing its computing efficiency. The outcomes are compared with the ANUDEM, compound and maximum z-tolerance methods and the results demonstrate the proposed algorithm reduces response time by up to 96% compared to other methods. As to RMSE, it performs better than ANUDEM and needs half the number of points to keep the same RMSE. The mean slope and surface roughness are reduced by less than 1% in the tested cases. The parallel algorithm provides better streamline matching. Given its high computing efficiency, the proposed algorithm can retrieve more critical points to meet the demands of higher precision.
This paper proposes an innovative simultaneous optimization approach for single and multi-component mass exchanger network synthesis (MENS). A retrofitted stage-wise superstructure and a parallelized random walk algor...
详细信息
This paper proposes an innovative simultaneous optimization approach for single and multi-component mass exchanger network synthesis (MENS). A retrofitted stage-wise superstructure and a parallelized random walk algorithm with compulsive evolution (RWCE) are adopted. An iterative calculation method is designed to satisfy the requirements of multi-component mass transfer, with a relaxation for the outlet composition of the lean streams. The parametric analysis shows that the relaxation coefficient plays a major role in driving the convergence of the method. To improve the robustness of the established model, an adaptive relaxation coefficient strategy is implemented for multi-component MENS problems. In a divergence situation, the outlet concentration of the lean stream can be adjusted automatically by a random relaxation coefficient. Finally, three industrial MENS examples are considered in this work, whose total annual cost (TAC) are reduced by 7179, 2212, and 551 $·year^(-1). The corresponding optimization times are obtained to be 336, 125, and 145 s. The results indicate improvements in the economy and time, demonstrating that the parallelized RWCE can yield an optimal TAC and optimization efficiency compared to previous results. Overall, the adaptive relaxation coefficient strategy enhances the convergence for multi-component MENS problems.
Whole-genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotyp...
详细信息
Whole-genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC's data structure and algorithms are valuable for accelerating large-scale genomic research.
Machine translation has developed rapidly. But there are some problems in machine translation, such as good reading, unable to reflect the mood and context, and even some languages machines cannot recognize. In order ...
详细信息
Machine translation has developed rapidly. But there are some problems in machine translation, such as good reading, unable to reflect the mood and context, and even some languages machines cannot recognize. In order to improve the quality of translation, this paper uses the SSCI method to improve the quality of translation. It is found that the translation quality of hierarchical phrases is significantly improved after using the parallel algorithm of machine translation, which is about 9% higher than before, and the problem of context free grammar is also solved. The research also found that the use of parallel algorithm can effectively reduce the network memory occupation;the original 10-character content, after using the parallel algorithm, only need to occupy 8 characters, and the optimization reaches 20%. This means that the parallel algorithm of hierarchical phrase machine translation based on distributed network memory can play a very important role in machine translation.
Traditional visual-inertial simultaneous localization and mapping algorithms are usually designed based on CPUs, and they cannot effectively utilize the parallel computing function of GPUs if they are directly transpl...
详细信息
Traditional visual-inertial simultaneous localization and mapping algorithms are usually designed based on CPUs, and they cannot effectively utilize the parallel computing function of GPUs if they are directly transplanted to an embedded board with a GPU module. However, the computing power of embedded devices is limited. It is unreasonable for the visual-inertial simultaneous localization and mapping algorithm to occupy most CPU computing resources while the GPU is idle. In this article, a parallelization scheme for the VINS-Mono algorithm based on GPU parallel computing technology is proposed. Based on the compute unified device architecture, the construction and solution of the incremental equation are parallelized in the nonlinear optimization process of the algorithm, and the parallelization methods provided by cuSOLVER and cuBLAS are used to carry out the marginalization of the algorithm. In addition, the program for the detection and matching of image feature points in the process of optical flow tracking is rewritten in the algorithm to realize the parallelization of optical flow tracking. After parallelization, the algorithm is found to run well on a heterogeneous computing model composed of a CPU and GPU and can fully exploit the parallel computing power of the GPU. The proposed method was tested on an NVIDIA's Jetson TX2 module and compared with the VINS-Mono algorithm;the speeds of the construction and solution of the incremental equation were found to be the same, but the optical flow tracking and marginalization speed of the proposed scheme exhibited improvements of about 1.5-1.7 times and 1.9 times, respectively.
Tomography reconstruction is the process of quickly reconstructing the original image form the projection obtained by X-ray radiation. At present, the high-resolution detector of the Shanghai Synchrotron Radiation Fac...
详细信息
ISBN:
(纸本)9781728165509
Tomography reconstruction is the process of quickly reconstructing the original image form the projection obtained by X-ray radiation. At present, the high-resolution detector of the Shanghai Synchrotron Radiation Facility (SSRF) can scan more than 4GB of tomographic data every 1.5 seconds, and the transmission speed is increased to more than 100GB s(-1). With the upgrade of high-resolution detectors and the increase of data transmission volume, the reconstruction computation on cloud has become a bottleneck in improving the speed of tomography reconstruction even if the fastest Gridrec algorithm is adopted. In this paper, we propose an improved serial Gridrec algorithm and a parallel Gridrec algorithm by improving the convolution kernel to optimize the speed of existing image reconstruction algorithms on low cost GPUs for edge computing. On these GPUs, the multi-threaded tomography reconstruction algorithm not only guarantees high-quality results, but also improves the reconstruction speed over original Gridrec algorithm by more than 11x, and over the classic FBP algorithm by more than 234x. Besides the significant speedup, our work would be the first parallel implementation of Gridrec algorithm on GPU for edge computing.
Tomography reconstruction is the process of quickly reconstructing the original image form the projection obtained by X-ray radiation. At present, the high-resolution detector of the Shanghai Synchrotron Radiation Fac...
详细信息
ISBN:
(数字)9781728165509
ISBN:
(纸本)9781728165516
Tomography reconstruction is the process of quickly reconstructing the original image form the projection obtained by X-ray radiation. At present, the high-resolution detector of the Shanghai Synchrotron Radiation Facility (SSRF) can scan more than 4GB of tomographic data every 1.5 seconds, and the transmission speed is increased to more than 100GB s -1 . With the upgrade of high-resolution detectors and the increase of data transmission volume, the reconstruction computation on cloud has become a bottleneck in improving the speed of tomography reconstruction even if the fastest Gridrec algorithm is adopted. In this paper, we propose an improved serial Gridrec algorithm and a parallel Gridrec algorithm by improving the convolution kernel to optimize the speed of existing image reconstruction algorithms on low cost GPUs for edge computing. On these GPUs, the multi-threaded tomography reconstruction algorithm not only guarantees high-quality results, but also improves the reconstruction speed over original Gridrec algorithm by more than 11x, and over the classic FBP algorithm by more than 234x. Besides the significant speedup, our work would be the first parallel implementation of Gridrec algorithm on GPU for edge computing.
This letter presents an efficient parallel algorithm for solving locally one-dimensional (LOD) finite-difference time domain (FDTD) in cloud computing environment. As opposed to the existing LOD-FDTD algorithm paralle...
详细信息
This letter presents an efficient parallel algorithm for solving locally one-dimensional (LOD) finite-difference time domain (FDTD) in cloud computing environment. As opposed to the existing LOD-FDTD algorithmparallelization scheme, the proposed method solves the implicit tridiagonal system in parallel by using the Sherman-Morrison formula to decompose the tridiagonal matrix into smaller matrices. The parallel nodes in cloud computers solve the matrices simultaneously. Numerical results show that the proposed method is more efficient in cloud computing environment than the conventional parallelization scheme and shows better scalability.
In the field of wireless network optimization, with the enlargement of network size and the complication of network structure, traditional processing methods cannot effectively identify the causes of network faults in...
详细信息
ISBN:
(纸本)9783319635644;9783319635637
In the field of wireless network optimization, with the enlargement of network size and the complication of network structure, traditional processing methods cannot effectively identify the causes of network faults in the face of increasing network data. In this paper, we propose a root-cause-analysis method based on distributed data mining (DRCA). Firstly, we put forward an improved decision tree, where the selection of the best split-feature is based on the feature's puritygain, and then we skillfully convert the problem of root-cause-analysis into modeling of an improved decision tree and interpretation of the tree model. In order to solve the problem of memory and efficiency associated with large-scale data, we parallelize the algorithm and distribute the tasks to multiple computers. The experiments show that DRCA is an effective, efficient, and scalable method.
The development and the usage of parallel computing systems make it necessary to research parallelization resource of algorithms for search of the most rapid implementation. The algorithm representation as Q-determina...
详细信息
ISBN:
(纸本)9783319219097;9783319219080
The development and the usage of parallel computing systems make it necessary to research parallelization resource of algorithms for search of the most rapid implementation. The algorithm representation as Q-determinant is one of the approaches that can be applied for that case. Such representation allows getting the most rapid possible implementation of the algorithm evaluates its performance complexity. Our work is to develop software system QStudio, which presents algorithm in the form of Q-determinant using the flowchart, finds the most rapid implementation of that one and builds an execution plan. The obtained results are oriented to ideal model of parallel computer system. However they can be a basis for automated execution of the most rapid algorithm implementations for real parallel computing systems.
暂无评论