检索结果-内蒙古大学图书馆

heterogeneous parallel computing based real-time chaotic video encryption and its application to drone-oriented secure communication

引用

CHAOS SOLITONS & FRACTALS 2024年 181卷

作者： Shi, Fan-feng Li, Tao Hu, Hao-yu Li, Yi-fei Shan, Dan Jiang, Dong Yangzhou Polytech Inst Sch Informat Engn Yangzhou 225007 Peoples R China Anhui Univ Sch Internet Hefei 230039 Peoples R China Nanjing Univ Dept Math Nanjing 210093 Peoples R China Anhui Univ Natl Engn Res Ctr Agroecol Big Data Anal & Applica Hefei 230601 Peoples R China Hefei Financial China Informat & Technol Co Ltd Fin China Anhui Univ Joint Lab Financial Big Data Hefei 230022 Peoples R China

This paper proposes a real -time video encryption strategy based on multi-round confusion-diffusion architecture and heterogeneous parallel computing. It leverages the powerful computing capacity of the Central Processing Unit (CPU) and the high parallel capability of the Graphics Processing Unit (GPU) to perform byte generation, confusion and diffusion operations concurrently, thereby enhancing computational efficiency. Statistical and security analysis demonstrate that the proposed method exhibits exceptional statistical properties and provides resistance against different types of attacks. Encryption speed evaluation shows that it can realize latency-free 768x768 30FPS video encryption using Intel Xeon Gold 6226R and NVIDIA GeForce RTX 3090, with an average encryption time of 25.12 ms, despite performing seven rounds of confusion and six rounds of diffusion operations on each frame. Additionally, the proposed strategy is adopted to implement a droneoriented secure video communication system, achieving latency-free 256x256 29FPS video encryption with NVIDIA Jetson Xavier NX (NVIDIA Camel ARM CPU and Volta GPU).

关键词： Real-time video encryption heterogeneous parallel computing Chaos Confusion and diffusion

来源：评论

学校读者我要写书评

暂无评论

heterogeneous parallel computing accelerated iterative subpixel digital image correlation

引用

Science China(Technological Sciences) 2018年第1期61卷 74-85页

作者： HUANG JianWen ZHANG LingQi JIANG ZhenYu DONG ShouBin CHEN Wei LIU YiPing LIU ZeJia ZHOU LiCheng TANG LiQun State Key Laboratory of Subtropical Building Science School of Civil Engineering and Transportation South China University of Technology School of Computer Science South China University of Technology The State Key Laboratory of Nonlinear Mechanics Institute of Mechanics Chinese Academy of Sciences

parallel computing techniques have been introduced into digital image correlation(DIC) in recent years and leads to a surge in computation speed. The graphics processing unit(GPU)-based parallel computing demonstrated a surprising effect on accelerating the iterative subpixel DIC, compared with CPU-based parallel computing. In this paper, the performances of the two kinds of parallel computing techniques are compared for the previously proposed path-independent DIC method, in which the initial guess for the inverse compositional Gauss-Newton(IC-GN) algorithm at each point of interest(POI) is estimated through the fast Fourier transform-based cross-correlation(FFT-CC) algorithm. Based on the performance evaluation, a heterogeneous parallel computing(HPC) model is proposed with hybrid mode of parallelisms in order to combine the computing power of GPU and multicore CPU. A scheme of trial computation test is developed to optimize the configuration of the HPC model on a specific computer. The proposed HPC model shows excellent performance on a middle-end desktop computer for real-time subpixel DIC with high resolution of more than 10000 POIs per frame.

关键词： digital image correlation(DIC) inverse compositional Gauss-Newton(IC-GN) algorithm heterogeneous parallel computing graphics processing unit(GPU) multicore CPU real-time DIC

来源：评论

学校读者我要写书评

暂无评论

Fast hydrological model calibration based on the heterogeneous parallel computing accelerated shuffled complex evolution method

引用

ENGINEERING OPTIMIZATION 2018年第1期50卷 106-119页

作者： Kan, Guangyuan He, Xiaoyan Ding, Liuqian Li, Jiren Hong, Yang Zuo, Depeng Ren, Minglei Lei, Tianjie Liang, Ke Minist Water Resources China Inst Water Resources & Hydropower Res Res Ctr Flood & Drought Disaster Reduct State Key Lab Simulat & Regulat Water Cycle River Beijing Peoples R China Tsinghua Univ Dept Hydraul Engn State Key Lab Hydrosci & Engn Beijing Peoples R China Univ Oklahoma Dept Civil Engn & Environm Sci Norman OK 73019 USA Beijing Normal Univ Coll Water Sci Beijing Peoples R China Hohai Univ Coll Hydrol & Water Resources Nanjing Jiangsu Peoples R China

Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.

关键词： Model calibration Xinanjiang model heterogeneous parallel computing OpenMP CUDA

来源：评论

学校读者我要写书评

暂无评论

Utilization of a Web Browser for Complex heterogeneous parallel computing Using Multi-core CPU/GPU Systems 16th

引用

16th International Conference on Computer Aided Systems Theory (EUROCAST)

作者： Woda, Marek Hajduga, Adam Wroclaw Univ Technol Dept Comp Engn 11-17 Janiszewskiego PL-50372 Wroclaw Poland

ISBN: (纸本)9783319747187;9783319747170

Since the invention of the first microprocessor has passed many years. Technological developments in CPU construction is primarily based on increasing the performance of devices, their miniaturisation and the reduction of manufacturing costs. Well known Moore’s Law, speaking of doubling the number of transistors on a chip at regular intervals (going in hand with reduction of manufacturing costs), proved work well over years (initially assumed rate of eighteen months has been slightly extended to two years). Such a trend, due to the technological constraints cannot be everlasting; right now it can be already observed as it slows down. Limitations in minimum size of the individual components (transistors) and a total power draw of a system, forced to change the direction of the technological development. Instead of boost the clock of a processor, it was decided to multiply its number in a chip. Thanks to clustering of processor cores in a single chip that utilise fast shared cache memory, we still can observe considerable performance boost.

关键词： heterogeneous parallel computing HTML5 computing in a web browser OpenCL/WebCL

来源：评论

学校读者我要写书评

暂无评论

Fast Atmospheric Aerosol Size and Shape Imaging Instrument: Design, Calibration, and Intelligent Interaction

引用

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2025年 74卷

作者： Dong, Li Han, Yong Hu, Maohai Zhang, Yurong Zhou, Qicheng Sun Yat Sen Univ Sch Atmospher Sci Adv Sci & Technol Space Atmospher Phys Grp ASAG Zhuhai 519082 Peoples R China Nanjing Univ Sci & Technol Sch Elect & Opt Engn Nanjing 210094 Peoples R China

Atmospheric aerosol particles have a significant impact on radiation, climate, and human health, with their size and shape being fundamental physical parameters for atmospheric change research. Due to the widespread effects and applications of aerosol particles, the direct measurement of aerosol size and shape has become crucial. Nevertheless, several challenges persist in aerosol measurement instruments, including limited resolution, complex operation, poor synchronization, and inaccurate inversion methods. Therefore, we developed a new scientific instrument and corresponding image intelligent interaction system, whose name is the fast atmospheric aerosol size and shape imaging instrument (FASI). The instrument is designed for transmission imaging that contains a light source, imaging chamber, microscope objective, tube lens, extension tube, camera, etc. Before the operation, the FASI calibrates background field, pixel size, characteristic gray value (CGV), and depth of field (DOF) based on image processing. During intelligent interaction, the FASI extracts aerosol particles by image denoising and edge detection, and then uses our proposed defocus and duplicate particle detection algorithms for secondary screening of aerosols. Aerosol size and shape parameters are measured in parallel by the central processing unit (CPU) and the graphics processing unit (GPU) using heterogeneous computation. Polystyrene latex (PSL) calculations and quantitative experiments indicate that FASI can accurately detect 0.5-20 mu m aerosol particles. In particular, the FASI measures aerosol particles supplied by an aerosol generator, dryer, and neutralizer, demonstrating that the aerosol size distribution range of oil solutions (0.5-3.5 mu m) is narrower than that of aqueous solutions (0.5-7.5 mu m). For all samples, 92.12% of aerosols have an aspect ratio (AR) exceeding 1, and the shape of these nonspherical aerosols varies greatly from each other. The evaluations of computational effici

关键词： Aerosols Particle measurements Aerosol shape aerosol size computer vision heterogeneous parallel computing intelligent instrument Aerosol shape aerosol size computer vision heterogeneous parallel computing intelligent instrument

来源：评论

学校读者我要写书评

暂无评论

TOWARDS MEGACITY-SCALE WIND FLOW SIMULATIONS ON MANY-CORE CPU-ACCELERATOR SYSTEMS

引用

SIAM JOURNAL ON SCIENTIFIC computing 2025年第2期47卷 B402-B427页

作者： Xu, Lei Zhang, Qingyang Lu, Kai Yan, Zhengzheng Gong, Chunye Li, Shengguo Chen, Lin Chen, Xinhai Lin, Xuchuan Liu, Jie Wang, Zheng Chen, Rongliang Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Guangdong Peoples R China State Key Lab High Performance Comp Changsha 410000 Hunan Peoples R China China Earthquake Adm Inst Engn Mech Harbin 150080 Heilongjiang Peoples R China Univ Leeds Sch Comp West Yorkshire LS1 Leeds England

Urban wind flow simulation, based on numerical methods, serves as a powerful tool for understanding the intricate interactions between urban structures and atmospheric conditions. The Lattice Boltzmann method (LBM) stands out as a popular choice for simulating urban wind flow. However, traditional LBM approaches face limitations in terms of scalability on large parallel computing systems and their ability to support high-resolution wind flow simulations across vast megacities spanning hundreds of square kilometers. In response to these challenges, we introduce THLB (Tianhe lattice Boltzmann), a purpose-built LBM simulator tailored for large-scale urban wind flow simulations. THLB streamlines the preprocessing of extensive simulation data through an innovative scheme that automatically identifies flow regions along irregular boundaries. Additionally, THLB integrates a novel processing pipeline and employs parallel optimization techniques, enhancing scalability and performance for large-scale LBM simulations. Our assessment of THLB involves conducting wind flow simulations within a megacity covering an area of 50km \times 40km at an impressive one-meter simulation resolution, featuring 150,000 buildings. This simulation represents the most extensive urban wind flow analysis to date, comprising over two trillion simulation lattices. We gauge THLB's performance on the Tianhe new-generation supercomputer, harnessing more than 155 million heterogeneous cores. Our experimental results demonstrate exceptional performance and scalability, achieving a peak computation throughput of 24,553.43 G Lattices Updates Per Second (GLUPS), setting a new state-of-the-art benchmark for LBM simulations. Despite the inherent challenges of large-scale LBM simulations, our approach showcases robust scalability, delivering 90.48\% and 69.91\% of weak and strong scaling efficiency, respectively.

关键词： lattice Boltzmann method heterogeneous parallel computing high-performance computing city-wide meteorological micro-scale computational wind engineering

来源：评论

学校读者我要写书评

暂无评论

A GPU-Accelerated automated multilevel substructuring method for modal analysis of structures

引用

COMPUTERS & STRUCTURES 2024年 305卷

作者： Wang, Guidong Wang, Yujie Chen, Zeyu Wang, Feiqi Li, She Cui, Xiangyang Hunan Univ State Key Lab Adv Design & Mfg Technol Vehicle Changsha 410082 Peoples R China Hunan Maixi Software Co Ltd Changsha 410082 Peoples R China

In this work, a novel GPU-accelerated heterogeneous method for the automated multilevel substructuring method(HAMLS) is presented for dealing large finite element models in structural dynamics. Different parallel modes based on node, subtree, and eigenpair have been developed in the solution steps of AMLS to achieve a heterogeneous strategy. First, a new data management method is designed during the model transformation phase to eliminate the determinacy race in the parallel strategy of the separator tree. Considering the distribution characteristics of the nodes in the separator tree and the dependence of node tasks, a load balancing heterogeneous parallel strategy is designed to take full advantage of hosts and devices. By developing an adaptive batch processing program for solving eigenvectors during the back transformation phase, the overheads of launching kernels, as well as the GPU memory requirements, can be reduced by several orders of magnitude. Several numerical examples have been employed to validate the efficiency and practicality of the novel GPU-accelerated heterogeneous strategy. The results demonstrate that the computational efficiency of the novel strategy using one GPU can increase to 3.0x that of the original parallel AMLS method when 16 CPU threads are used.

关键词： AMLS method heterogeneous parallel computing Finite element analysis Eigenvalue problem

来源：评论

学校读者我要写书评

暂无评论

Efficient Utilization of Multi-Threading parallelism on heterogeneous Systems for Sparse Tensor Contraction

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2024年第6期35卷 889-900页

作者： Xiao, Guoqing Yin, Chuanghui Chen, Yuedan Duan, Mingxing Li, Kenli Hunan Univ Coll Comp Sci & Elect Engn Changsha 410082 Peoples R China Hunan Univ Shenzhen Inst Shenzhen 518063 Peoples R China Cent South Univ Big Data Inst Changsha 410083 Peoples R China

Many fields of scientific simulation, such as chemistry and condensed matter physics, are increasingly eschewing dense tensor contraction in favor of sparse tensor contraction. In this work, we center around binary sparse tensor contraction (SpTC) which has the challenges of index matching and accumulation. To address these difficulties, we present GSpTC, an efficient element-wise SpTC framework on CPU-GPU heterogeneous systems. GSpTC first introduces a fine-grained partitioning strategy based on element-wise tensor contraction. By analyzing and selecting appropriate dimension partitioning strategies, we can efficiently utilize the multi-threading parallelism on GPUs and optimize the overall performance of GSpTC. In particular, GSpTC leverages multi-threading parallelism on GPUs for the contraction phase and merging phase, which greatly accelerates the computation phase in sparse tensor contraction computations. Furthermore, GSpTC employs parallel pipeline technology to hide the data transmission time between the host and the device, further enhancing its performance. As a result, GSpTC achieves an average performance improvement of 267% compared to the previous state-of-the-art framework Sparta.

关键词： Tensors parallel processing Performance evaluation Graphics processing units Sparse matrices Partitioning algorithms Sorting heterogeneous parallel computing multi-threading parallelism optimization parallel pipeline sparse tensor contraction

来源：评论

学校读者我要写书评

暂无评论

Assessing Application Efficiency and Performance Portability in Single-Source Programming for heterogeneous parallel Systems

引用

INTERNATIONAL JOURNAL OF parallel PROGRAMMING 2023年第1期51卷 61-82页

作者： Ernstsson, August Griebler, Dalvan Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden Pontif Catholic Univ Rio Grande do Sul PUCRS Sch Technol Porto Alegre Brazil

We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU-GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.

关键词： Algorithmic skeletons parallel efficiency Performance portability heterogeneous parallel computing High-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

Fine-grained heterogeneous parallel direct solver for finite element problems

引用

COMPUTER PHYSICS COMMUNICATIONS 2023年 284卷

作者： Wang, Yujie Wang, Shengquan Zhang, Xuerui Li, Guangyao Cai, Yong Hunan Univ State Key Lab Adv Design & Mfg Vehicle Body Changsha 410082 Peoples R China Beijing Inst Technol Shenzhen Automot Res Inst Shenzhen 518118 Guangdong Peoples R China

S C The improvement of computational effectiveness is a vital issue in the field of large-scale finite element analysis. The performance is fundamentally determined by the efficiency of solving sparse linear system equations using the implicit finite element method. This paper presents a direct linear solver based on heterogeneous hybrid parallel computing on CPUs and GPUs. This can efficiently utilize computing resources of multiple devices to achieve performance improvement. Initially, we partition the elimination tree into several subtrees to accomplish the task decomposition. Based on this, we build a dynamic programming mathematical model to balance the computational load of the various devices. Then, we develop a numerical decomposition strategy by combining node parallelism and tree parallelism for the CPUs. In addition, efficient numerical decomposition is achieved on the GPU through batch processing and maximizing the overlap between computations and data transfers. Numerical experiments show that, compared with MKL PARDISO, the performance of numerical factorization can be improved by up to 10 times by using CPU and dual-path GPU hybrid calculations, and the computation time of simulation can be reduced by one-third for the multicondition analysis of Body In White and by 20% for the large-scale nonlinear finite element deformation analysis.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Direct solver heterogeneous parallel computing Finite element analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：