检索结果-内蒙古大学图书馆

A sparse domain decomposition method for parallel computing of a four-dimensional lattice spring model

INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS 2021年第17期45卷 2581-2601页

作者： Fu, Meng Zhao, Gao-Feng Tianjin Univ Sch Civil Engn Tianjin 300072 Peoples R China

In this work, an improved domain decomposition method is developed to address workload imbalance when implementing the parallel computing of a four-dimensional lattice spring model (4D-LSM) to solve problems in rock engineering on a large scale. A cubic domain decomposition scheme is adopted and optimized by a simulated annealing algorithm (SAA) to minimize the workload imbalance among subdomains. The improved domain decomposition method is implemented in the parallel computing of the 4D-LSM. Numerical results indicate that the proposed domain decomposition method can further improve the workload balance among processors, which is helpful to supersede the limit of computational scale when solving large-scale geotechnical problems and decrease the runtime of the parallel 4D-LSM by at most 40% compared to the original cubic decomposition method. This shows the practicability of the proposed method in parallel computing. Two types of target functions of SAA are tested, and their influence on the performance of the parallel 4D-LSM is investigated. Finally, a computational model with one billion particles for one actual engineering application of using 4D-LSM is realized, and the result shows the advantages of parallel computing.

关键词： domain decomposition lattice spring model parallel computing simulated annealing algorithm

来源：评论

学校读者我要写书评

暂无评论

Image search system and industrial product design based on CPU parallel computing

引用

WIRELESS NETWORKS 2021年 1-14页

作者： He, Li Hu, Zhefang Yu, Yang China Acad Art Creat Design Mfg Collaborat Innovat Ctr Hangzhou 310024 Zhejiang Peoples R China Zhejiang Univ Sir Run Run Shaw Hosp Dept Nutr Sch Med Hangzhou 310020 Zhejiang Peoples R China Dianzi Univ Informat Engn Sch Hangzhou 311305 Peoples R China

The goal of industrial product design is to market products that meet consumer needs, which specifies that the creative behavior of product design should be restricted by the market environment. The further integration of technology, design, and market is developing rapidly, and we need to shift the focus from art and design thinking to a favorable position that encompasses the entire society. The content-based image restoration technology analyzes the image according to the visual attributes and spatial position relationship of the image, such as color, texture, shape, etc., and creates an image feature vector database to reconstruct the image through image feature extraction. Nowadays, multi-core processors are everywhere. parallel algorithms developed on multi-core CPUs are highly adaptable and can be used in most environments. By using parallel computing on a multi-core CPU, you can make full use of processor resources and improve resources. By improving and optimizing algorithms, parallel processing technology is used to increase the speed of image processing. Sparse image resolution algorithms and aircraft recognition methods use multi-core CPU parallel processing technology to increase processing speed and processing efficiency. Full-featured image search. This system effectively eliminates search restrictions through a function and improves search efficiency and search results. Accuracy. Finally, use the test data set to test the system and further optimize its performance. Combined with typical cases, analyze the close relationship between human vision, hearing, touch, movement, smell, etc. and product safety, and accept and summarize the safety principles of industrial product design.

关键词： parallel computing Industrial product design Image features Image search system

来源：评论

学校读者我要写书评

暂无评论

Discrete-event simulations for metro train operation under emergencies: A multi-agent based model with parallel computing

引用

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS 2021年 573卷 125964-125964页

作者： Li, Yang Yang, Xin Wu, Jianjun Sun, Huijun Guo, Xin Zhou, Li Beijing Jiaotong Univ State Key Lab Rail Traff Control & Safety Beijing 100044 Peoples R China Beijing Jiaotong Univ Key Lab Transport Ind Big Data Applicat Technol C Minist Transport Beijing 100044 Peoples R China Beijing Wuzi Univ Sch Informat Beijing 101149 Peoples R China

Emergencies of metro systems have become more frequent in rush hours, which have significant consequences for metro planning, designing, operating, and even the passengers' daily travel. The motivation of this paper is to establish a hybrid metro simulation method with high efficiency and sufficient precision. To this end, a discrete-event simulation method based on a multi-agent model with parallel computing is proposed to estimate the effects of emergencies efficiently. Firstly, the trains' motion algorithms are developed to compute the train speed profile for normal operation and metro emergency operation, respectively. Moreover, three types of agents (passenger, station, and train agents) are classified for rescheduling calculation, and six types of events are defined to discretize the emergency simulation process. Furthermore, a parallel computing method is proposed to accelerate the simulation process. Finally, a case study of the Yizhuang Line in Beijing metro is conducted to verify the effectiveness of the proposed simulation methodology. The results have proved the effectiveness and practicality of the proposed simulation method and the influence of the positions where emergencies occur and the emergency durations upon delays of trains and passengers. (C) 2021 Elsevier B.V. All rights reserved.

关键词： Train simulation Discrete-event method Multi-agent Model Metro emergency parallel computing

来源：评论

学校读者我要写书评

暂无评论

Multiscale analysis for 3D lattice structures based on parallel computing

引用

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING 2021年第22期122卷 6756-6776页

作者： Yan, Jun Huo, Sixu Yu, Tao Zhang, Chenguang Chai, Xianghai Hu, Dapeng Yan, Kun Dalian Univ Technol Int Res Ctr Computat Mech Dept Engn Mech State Key Lab Struct Anal Ind Equipment Dalian Peoples R China Dalian Univ Technol Ningbo Res Inst Ningbo Peoples R China Shanghai Engn Res Ctr Civil Aero Engine Shanghai Peoples R China Dalian Univ Technol Sch Chem Machinery & Safety Dalian 116024 Liaoning Peoples R China

In this study, a parallel framework for the multiscale analysis of three-dimensional lattice structures is developed. The established parallel framework performs the multiscale analysis using the extended multiscale finite element method (EMsFEM) in a parallel environment provided by the portable and extensible toolkit for scientific computation (PETSc), which is a high-performance parallel scientific computing library. To realize this aim, we developed several modifications of the original EMsFEM method and adopted some routines from PETSc. Through numerical examples, the efficiency and accuracy of the proposed parallel computing framework were verified first. Then, the parallel acceleration ratio and parallel efficiency of the proposed parallel framework were studied. The proposed parallel computing framework shows good efficiency and can be used to deal with the analysis of large-scale lattice structures.

关键词： EMsFEM lattice material multiscale analysis parallel computing PETSc

来源：评论

学校读者我要写书评

暂无评论

Efficient In-Dram Near-Bank Processing for Emerging parallel computing Workloads

Efficient In-Dram Near-Bank Processing for Emerging Parallel...

引用

作者： Xie, Xinfeng University of California Santa Barbara

学位级别：Ph.D., Doctor of Philosophy

Despite the success of parallel architectures and domain-specific accelerators in boosting the performance of emerging parallel workloads, contemporary computer organizations still face the bottleneck of data movement between processors and the main memory. Processing-in-memory (PIM) architectures, especially those designs integrating compute logics near DRAM memory banks, are promising to address this bottleneck. However, such an in-DRAM near-bank integration faces hardware and software design challenges in performance, area overheads, architecture complexity, and programmability. To address these challenges, this dissertation focuses on developing efficient hardware and software solutions for in-DRAM near-bank computing. First, this dissertation investigates the memory bandwidth bottleneck of contemporary hardware platforms through in-depth workload characterization, which motivates in-DRAM near-bank processing solutions. Second, this dissertation proposes multiple full-stack in-DRAM near-bank processing solutions targeting different application scopes that vary from application-specific to general-purpose computing. These solutions reveal a wide spectrum of trade-off points among hardware efficiency, architecture flexibility, and software complexity. On top of these solutions, this dissertation introduces an open-source simulation framework that supports the architectural and software optimization studies of in-DRAM near-bank processing. Finally, this dissertation develops novel machine learning-based compiler optimizations for partitioning workloads on a chiplet hardware platform that has a distributed compute-memory abstraction similar to in-DRAM near-bank architectures.

关键词： Memory system parallel computing Processing-in-memory

来源：评论

学校读者我要写书评

暂无评论

Lightweight network and parallel computing for fast pedestrian detection

引用

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS 2021年第4期49卷 1040-1049页

作者： Wu, Jianpeng Men, Yao Chen, DeSheng Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China

In recent years, researchers have made great efforts in computer vision task (e.g., object detection) with the widely use of convolutional neural networks (CNNs). However, object detection algorithms based on CNNs suffer from high computation cost even on the high-performance computers. In addition, with the development of high-resolution videos, the deployment of object detection algorithms becomes more and more difficult because of the large amount of data, let alone the portable platforms, such as unmanned aerial vehicles (UAVs). In this paper, we research a lightweight network on portable platform for outdoor tiny pedestrian detection. Concretely, we first set up a training dataset manually for lack of tiny pedestrian samples in common datasets. We provide a lightweight network, and then, parallel computing is introduced to make the most of the advantage of GPU. Finally, our method can achieve real-time performance on Jetson TX2. Experimental results verify that the proposed model has promising performance in tiny pedestrian detection designed for portable GPU platforms.

关键词： Jetson TX2 lightweight parallel computing pedestrian detection

来源：评论

学校读者我要写书评

暂无评论

Analysis of Global and Local Synchronization in parallel computing

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2021年第5期32卷 988-1000页

作者： Cicirelli, Franco Giordano, Andrea Mastroianni, Carlo ICAR CNR I-87036 Arcavacata Di Rende Italy

In a parallel computing scenario, the synchronization overhead, needed to coordinate the execution on the parallel computing nodes, can significantly impair the overall execution performance. Typically, synchronization is achieved by adopting a global synchronization schema involving all the nodes. In many application domains, though, a looser synchronization schema, namely, local synchronization, can be exploited, in which each node needs to synchronize only with a subset of the other nodes. In this work, we compare the performance of global and local synchronization using the efficiency, i.e., the ratio between the useful computing time and the total computing time, including the synchronization overhead, as a key performance indicator. We present an analytical study of the asymptotic behavior of the efficiency when the number of nodes increases. As an original contribution, we prove, using the Max-Plus algebra, that there is a non-zero lower bound on the efficiency in the case of local synchronization and we present a statistical procedure to find a value of this bound. This outcome marks a significant advantage of local synchronization with respect to global synchronization, for which the efficiency tends to zero when increasing the number of nodes.

关键词： Algebra Computational modeling Simulation parallel processing Probabilistic logic Random variables Synchronization parallel computing efficiency synchronization

来源：评论

学校读者我要写书评

暂无评论

Fast Algorithm Based on parallel computing for Sample Entropy Calculation

引用

IEEE ACCESS 2021年 9卷 20223-20234页

作者： Dong, Xinzheng Chen, Chang Geng, Qingshan Zhang, Wensheng Zhang, Xiaohua Douglas South China Univ Technol Sch Software Engn Guangzhou 510006 Peoples R China Jilin Univ Zhuhai Lab Key Lab Symbol Computat & Knowledge En Minist Educ Zhuhai Coll Zhuhai 519041 Peoples R China Univ Macau CRDA Fac Hlth Sci Taipa Macao Peoples R China Guangdong Acad Med Sci Guangdong Gen Hosp Guangzhou 510080 Peoples R China Chinese Acad Sci Res Ctr Precis Sensing & Control Inst Automat Beijing 100864 Peoples R China

Sample entropy is a widely used method for assessing the irregularity of physiological signals, but it has a high computational complexity, which prevents its application for time-sensitive scenes. To improve the computational performance of sample entropy analysis for the continuous monitoring of clinical data, a fast algorithm based on OpenCL was proposed in this paper. OpenCL is an open standard supported by a majority of graphics processing unit (GPU) and operating systems. Based on this protocol, a fast-parallel algorithm, OpenCLSampEn, was proposed for sample entropy calculation. A series of 24-hour heartbeat data were used to verify the robustness of the algorithm. Experimental results showed that OpenCLSampEn exhibits great accelerating performance. With common parameters, this algorithm can reduce the execution time to 1/75 of the base algorithm when the signal length is larger than 60,000. OpenCLSampEn also exhibits robustness for different embedding dimensions, tolerance thresholds, scales and operating systems. In addition, an R package of the algorithm is provided in GitHub. We proposed a sample entropy fast algorithm based on OpenCL that exhibits significant improvement for the computation performance of sample entropy. The algorithm has broad utility in sample entropy when facing the challenge of future rapid growth in the quantity of continuous clinical and physiological signals.

关键词： Entropy Graphics processing units Kernel Time series analysis Biomedical monitoring Standards Acceleration Algorithm fast computation graphics processing unit parallel computing sample entropy

来源：评论

学校读者我要写书评

暂无评论

Vehicle video surveillance system based on image fusion and parallel computing

引用

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS 2021年第5期49卷 1532-1547页

作者： Liu, Shan Lyu, Chengang Gong, Haotian Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China

Autonomous driving has gradually moved towards practical applications in recent years. It is particularly critical to provide reliable real-time environmental information for autonomous driving systems. At present, vehicle video surveillance systems based on multi-source video and target detection algorithms can effectively solve these problems. However, the previous vehicle video surveillance systems are often unable to balance the surveillance effect and the surveillance frame rate. Therefore, we will introduce a vehicle video surveillance system based on parallel computing and computer vision in this article. First, multiple fisheye cameras are used to collect surround-view environmental information. Second, we will use a low-light camera, infrared thermal imager, and millimeter-wave radar to provide forward-view environmental information at night. Correspondingly, we designed the surround-view image fusion algorithm and the forward-view image fusion algorithm based on parallel computing. At last, a monocular camera and detection algorithms are used to provide forward-view detection results. In a word, this vehicle video surveillance system will benefit the practical application of autonomous driving.

关键词： image fusion lane detection light equalization object detection panoramic mosaic parallel computing

来源：评论

学校读者我要写书评

暂无评论

Sustainable Material Selection in New Constructions: A Brute-Force Optimization Framework Using parallel computing, Cost Benefits, and Thermal Performance Analysis

引用

ADVANCES IN CIVIL ENGINEERING 2024年第1期2024卷

作者： Arab Anvari, Ehsan Sadi, Sajad Gholami, Javad Fayaz, Rima Iran Univ Art Fac Architecture & Urbanism Dept Architectural Technol Tehran Iran Imam Hossein Comprehens Univ Dept Mech Engn Tehran Iran

Sustainable construction practices rely on carefully selecting building materials and balancing environmental and economic considerations. This study examines the complex link between local climate, market dynamics, and building material selection. Market data analysis, parametric modeling, and brute-force optimization are used to provide insights into construction decision-making. Across 5540 simulations, a thorough assessment of the financial and energy performance of various materials for walls, roofs, windows, and floors is conducted. Incorporating Pareto ranking, parallel simulation, and sensitivity analysis, the comprehensive evaluation reveals the intricate tradeoffs between cost, thermal properties, and energy savings. The findings highlight the potential for optimal external wall solutions to reduce U-values by up to 30% and achieve source energy savings of up to 25% source energy savings across diverse climates. By emphasizing the importance of local context in material selection, this study highlights how energy consumption patterns and transmission losses influence financial and energy performance, thus advancing sustainable construction practices.

关键词： brute-force optimization energy efficiency material selection parallel computing parametric modeling Pareto score ranking sensitivity analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：