In this work, an improved domain decomposition method is developed to address workload imbalance when implementing the parallel computing of a four-dimensional lattice spring model (4D-LSM) to solve problems in rock e...
详细信息
In this work, an improved domain decomposition method is developed to address workload imbalance when implementing the parallel computing of a four-dimensional lattice spring model (4D-LSM) to solve problems in rock engineering on a large scale. A cubic domain decomposition scheme is adopted and optimized by a simulated annealing algorithm (SAA) to minimize the workload imbalance among subdomains. The improved domain decomposition method is implemented in the parallel computing of the 4D-LSM. Numerical results indicate that the proposed domain decomposition method can further improve the workload balance among processors, which is helpful to supersede the limit of computational scale when solving large-scale geotechnical problems and decrease the runtime of the parallel 4D-LSM by at most 40% compared to the original cubic decomposition method. This shows the practicability of the proposed method in parallel computing. Two types of target functions of SAA are tested, and their influence on the performance of the parallel 4D-LSM is investigated. Finally, a computational model with one billion particles for one actual engineering application of using 4D-LSM is realized, and the result shows the advantages of parallel computing.
作者:
He, LiHu, ZhefangYu, YangChina Acad Art
Creat Design Mfg Collaborat Innovat Ctr Hangzhou 310024 Zhejiang Peoples R China Zhejiang Univ
Sir Run Run Shaw Hosp Dept Nutr Sch Med Hangzhou 310020 Zhejiang Peoples R China Dianzi Univ
Informat Engn Sch Hangzhou 311305 Peoples R China
The goal of industrial product design is to market products that meet consumer needs, which specifies that the creative behavior of product design should be restricted by the market environment. The further integratio...
详细信息
The goal of industrial product design is to market products that meet consumer needs, which specifies that the creative behavior of product design should be restricted by the market environment. The further integration of technology, design, and market is developing rapidly, and we need to shift the focus from art and design thinking to a favorable position that encompasses the entire society. The content-based image restoration technology analyzes the image according to the visual attributes and spatial position relationship of the image, such as color, texture, shape, etc., and creates an image feature vector database to reconstruct the image through image feature extraction. Nowadays, multi-core processors are everywhere. parallel algorithms developed on multi-core CPUs are highly adaptable and can be used in most environments. By using parallel computing on a multi-core CPU, you can make full use of processor resources and improve resources. By improving and optimizing algorithms, parallel processing technology is used to increase the speed of image processing. Sparse image resolution algorithms and aircraft recognition methods use multi-core CPU parallel processing technology to increase processing speed and processing efficiency. Full-featured image search. This system effectively eliminates search restrictions through a function and improves search efficiency and search results. Accuracy. Finally, use the test data set to test the system and further optimize its performance. Combined with typical cases, analyze the close relationship between human vision, hearing, touch, movement, smell, etc. and product safety, and accept and summarize the safety principles of industrial product design.
Emergencies of metro systems have become more frequent in rush hours, which have significant consequences for metro planning, designing, operating, and even the passengers' daily travel. The motivation of this pap...
详细信息
Emergencies of metro systems have become more frequent in rush hours, which have significant consequences for metro planning, designing, operating, and even the passengers' daily travel. The motivation of this paper is to establish a hybrid metro simulation method with high efficiency and sufficient precision. To this end, a discrete-event simulation method based on a multi-agent model with parallel computing is proposed to estimate the effects of emergencies efficiently. Firstly, the trains' motion algorithms are developed to compute the train speed profile for normal operation and metro emergency operation, respectively. Moreover, three types of agents (passenger, station, and train agents) are classified for rescheduling calculation, and six types of events are defined to discretize the emergency simulation process. Furthermore, a parallel computing method is proposed to accelerate the simulation process. Finally, a case study of the Yizhuang Line in Beijing metro is conducted to verify the effectiveness of the proposed simulation methodology. The results have proved the effectiveness and practicality of the proposed simulation method and the influence of the positions where emergencies occur and the emergency durations upon delays of trains and passengers. (C) 2021 Elsevier B.V. All rights reserved.
In this study, a parallel framework for the multiscale analysis of three-dimensional lattice structures is developed. The established parallel framework performs the multiscale analysis using the extended multiscale f...
详细信息
In this study, a parallel framework for the multiscale analysis of three-dimensional lattice structures is developed. The established parallel framework performs the multiscale analysis using the extended multiscale finite element method (EMsFEM) in a parallel environment provided by the portable and extensible toolkit for scientific computation (PETSc), which is a high-performance parallel scientific computing library. To realize this aim, we developed several modifications of the original EMsFEM method and adopted some routines from PETSc. Through numerical examples, the efficiency and accuracy of the proposed parallel computing framework were verified first. Then, the parallel acceleration ratio and parallel efficiency of the proposed parallel framework were studied. The proposed parallel computing framework shows good efficiency and can be used to deal with the analysis of large-scale lattice structures.
Despite the success of parallel architectures and domain-specific accelerators in boosting the performance of emerging parallel workloads, contemporary computer organizations still face the bottleneck of data movement...
详细信息
Despite the success of parallel architectures and domain-specific accelerators in boosting the performance of emerging parallel workloads, contemporary computer organizations still face the bottleneck of data movement between processors and the main memory. Processing-in-memory (PIM) architectures, especially those designs integrating compute logics near DRAM memory banks, are promising to address this bottleneck. However, such an in-DRAM near-bank integration faces hardware and software design challenges in performance, area overheads, architecture complexity, and programmability. To address these challenges, this dissertation focuses on developing efficient hardware and software solutions for in-DRAM near-bank computing. First, this dissertation investigates the memory bandwidth bottleneck of contemporary hardware platforms through in-depth workload characterization, which motivates in-DRAM near-bank processing solutions. Second, this dissertation proposes multiple full-stack in-DRAM near-bank processing solutions targeting different application scopes that vary from application-specific to general-purpose computing. These solutions reveal a wide spectrum of trade-off points among hardware efficiency, architecture flexibility, and software complexity. On top of these solutions, this dissertation introduces an open-source simulation framework that supports the architectural and software optimization studies of in-DRAM near-bank processing. Finally, this dissertation develops novel machine learning-based compiler optimizations for partitioning workloads on a chiplet hardware platform that has a distributed compute-memory abstraction similar to in-DRAM near-bank architectures.
In recent years, researchers have made great efforts in computer vision task (e.g., object detection) with the widely use of convolutional neural networks (CNNs). However, object detection algorithms based on CNNs suf...
详细信息
In recent years, researchers have made great efforts in computer vision task (e.g., object detection) with the widely use of convolutional neural networks (CNNs). However, object detection algorithms based on CNNs suffer from high computation cost even on the high-performance computers. In addition, with the development of high-resolution videos, the deployment of object detection algorithms becomes more and more difficult because of the large amount of data, let alone the portable platforms, such as unmanned aerial vehicles (UAVs). In this paper, we research a lightweight network on portable platform for outdoor tiny pedestrian detection. Concretely, we first set up a training dataset manually for lack of tiny pedestrian samples in common datasets. We provide a lightweight network, and then, parallel computing is introduced to make the most of the advantage of GPU. Finally, our method can achieve real-time performance on Jetson TX2. Experimental results verify that the proposed model has promising performance in tiny pedestrian detection designed for portable GPU platforms.
In a parallel computing scenario, the synchronization overhead, needed to coordinate the execution on the parallel computing nodes, can significantly impair the overall execution performance. Typically, synchronizatio...
详细信息
In a parallel computing scenario, the synchronization overhead, needed to coordinate the execution on the parallel computing nodes, can significantly impair the overall execution performance. Typically, synchronization is achieved by adopting a global synchronization schema involving all the nodes. In many application domains, though, a looser synchronization schema, namely, local synchronization, can be exploited, in which each node needs to synchronize only with a subset of the other nodes. In this work, we compare the performance of global and local synchronization using the efficiency, i.e., the ratio between the useful computing time and the total computing time, including the synchronization overhead, as a key performance indicator. We present an analytical study of the asymptotic behavior of the efficiency when the number of nodes increases. As an original contribution, we prove, using the Max-Plus algebra, that there is a non-zero lower bound on the efficiency in the case of local synchronization and we present a statistical procedure to find a value of this bound. This outcome marks a significant advantage of local synchronization with respect to global synchronization, for which the efficiency tends to zero when increasing the number of nodes.
Sample entropy is a widely used method for assessing the irregularity of physiological signals, but it has a high computational complexity, which prevents its application for time-sensitive scenes. To improve the comp...
详细信息
Sample entropy is a widely used method for assessing the irregularity of physiological signals, but it has a high computational complexity, which prevents its application for time-sensitive scenes. To improve the computational performance of sample entropy analysis for the continuous monitoring of clinical data, a fast algorithm based on OpenCL was proposed in this paper. OpenCL is an open standard supported by a majority of graphics processing unit (GPU) and operating systems. Based on this protocol, a fast-parallel algorithm, OpenCLSampEn, was proposed for sample entropy calculation. A series of 24-hour heartbeat data were used to verify the robustness of the algorithm. Experimental results showed that OpenCLSampEn exhibits great accelerating performance. With common parameters, this algorithm can reduce the execution time to 1/75 of the base algorithm when the signal length is larger than 60,000. OpenCLSampEn also exhibits robustness for different embedding dimensions, tolerance thresholds, scales and operating systems. In addition, an R package of the algorithm is provided in GitHub. We proposed a sample entropy fast algorithm based on OpenCL that exhibits significant improvement for the computation performance of sample entropy. The algorithm has broad utility in sample entropy when facing the challenge of future rapid growth in the quantity of continuous clinical and physiological signals.
Autonomous driving has gradually moved towards practical applications in recent years. It is particularly critical to provide reliable real-time environmental information for autonomous driving systems. At present, ve...
详细信息
Autonomous driving has gradually moved towards practical applications in recent years. It is particularly critical to provide reliable real-time environmental information for autonomous driving systems. At present, vehicle video surveillance systems based on multi-source video and target detection algorithms can effectively solve these problems. However, the previous vehicle video surveillance systems are often unable to balance the surveillance effect and the surveillance frame rate. Therefore, we will introduce a vehicle video surveillance system based on parallel computing and computer vision in this article. First, multiple fisheye cameras are used to collect surround-view environmental information. Second, we will use a low-light camera, infrared thermal imager, and millimeter-wave radar to provide forward-view environmental information at night. Correspondingly, we designed the surround-view image fusion algorithm and the forward-view image fusion algorithm based on parallel computing. At last, a monocular camera and detection algorithms are used to provide forward-view detection results. In a word, this vehicle video surveillance system will benefit the practical application of autonomous driving.
Sustainable construction practices rely on carefully selecting building materials and balancing environmental and economic considerations. This study examines the complex link between local climate, market dynamics, a...
详细信息
Sustainable construction practices rely on carefully selecting building materials and balancing environmental and economic considerations. This study examines the complex link between local climate, market dynamics, and building material selection. Market data analysis, parametric modeling, and brute-force optimization are used to provide insights into construction decision-making. Across 5540 simulations, a thorough assessment of the financial and energy performance of various materials for walls, roofs, windows, and floors is conducted. Incorporating Pareto ranking, parallel simulation, and sensitivity analysis, the comprehensive evaluation reveals the intricate tradeoffs between cost, thermal properties, and energy savings. The findings highlight the potential for optimal external wall solutions to reduce U-values by up to 30% and achieve source energy savings of up to 25% source energy savings across diverse climates. By emphasizing the importance of local context in material selection, this study highlights how energy consumption patterns and transmission losses influence financial and energy performance, thus advancing sustainable construction practices.
暂无评论