A sub-pixel digital image correlation (DIC) method with a path-independent displacement tracking strategy has been implemented on NVIDIA compute unified device architecture (CUDA) for graphics processing unit (GPU) de...
详细信息
A sub-pixel digital image correlation (DIC) method with a path-independent displacement tracking strategy has been implemented on NVIDIA compute unified device architecture (CUDA) for graphics processing unit (GPU) devices. Powered by parallel computing technology, this parallel DIC (paDIC) method, combining an inverse compositional Gauss-Newton (IC-GN) algorithm for sub-pixel registration with a fast Fourier transform-based cross correlation (FFT-CC) algorithm for integer-pixel initial guess estimation, achieves a superior computation efficiency over the DIC method purely running on CPU. In the experiments using simulated and real speckle images, the paDIC reaches a computation speed of 1.66 x 10(5) POI/s (points of interest per second) and 1.13 x 10(5) POI/s respectively, 57-76 times faster than its sequential counterpart, without the sacrifice of accuracy and precision. To the best of our knowledge, it is the fastest computation speed of a sub-pixel DIC method reported heretofore. (C) 2015 Elsevier Ltd. All rights reserved.
With the development of smart grid and electricity market, the uncertainty for power flow is greatly aggravated, and thus leads to a great challenge on the traditional expansion methods for distribution systems to sat...
详细信息
With the development of smart grid and electricity market, the uncertainty for power flow is greatly aggravated, and thus leads to a great challenge on the traditional expansion methods for distribution systems to satisfy the future demands. In this paper, a data-driven multi-state distribution system expansion planning (DSEP) model is explored. Innovatively, amplitude values and profiles of uncertain factors in distribution systems are considered separately. Based on the massive historical measurement data, kernel density estimation and adaptive clustering are utilized to aggregate the typical amplitudes and profiles of time-varying variables respectively without prior knowledge. Consolidating all the uncertain factors, a multi-state model is established which extends DSEP into the deterministic initial planning and the probabilistic re-planning. The minimization of the overall planning cost is considered as the objective, which takes the initial planning costs and the expected costs of the initial plans being adapted to other future states into account. In this way, the flexibility of DSEP can be greatly enhanced and extra investments caused by frequent adjustments of plans are reduced. To avoid the rapid growth of CPU time due to multi-state model utilization, an integrated differential evolution and cross entropy algorithm implemented on a three-hierarchy parallel platform is proposed. The feasibilities of the proposed data-driven multi-state DSEP model and the parallel integrated solution method are demonstrated by numerical studies on a realistic 61-bus distribution system.
Connected component labeling is a frequently used image processing task in many applications. Moreover, in recent years, the use of 3D image data has become widespread, for instance, in 3D X-ray computed tomography an...
详细信息
Connected component labeling is a frequently used image processing task in many applications. Moreover, in recent years, the use of 3D image data has become widespread, for instance, in 3D X-ray computed tomography and magnetic resonance imaging. However, because ordinary labeling algorithms use a large amount of memory and 3D images are generally large, labeling 3D image data can cause memory shortages. Furthermore, labeling a large image is time-consuming. In this paper, we proposed new memory-efficient connected component labeling algorithm for 3D images with parallel computing. In this method, we accelerate the labeling process using parallel computing. In addition, we use a spans matrix and compressed label matrix to reduce memory usage. We also use an equivalence chain approach to speed up the calculation. Furthermore, the algorithm has two options for further processing performance or further memory savings. In the experiments on real examples, the proposed algorithm with the option for processing speed was faster and used less memory than the conventional label equivalence method. In contrast, with the proposed method using the memory-efficient option, we could further reduce memory from one-eighth to one-thirteenth that used by the label equivalence method while maintaining the same performance.
In data-intensive parallel computing clusters, it is important to provide deadline-guaranteed service to jobs while minimizing resource usage (e.g., network bandwidth and energy). Under the current computing framework...
详细信息
In data-intensive parallel computing clusters, it is important to provide deadline-guaranteed service to jobs while minimizing resource usage (e.g., network bandwidth and energy). Under the current computing framework (that first allocates data and then schedules jobs), in a busy cluster with many jobs, it is difficult to achieve high data locality (hence low bandwidth consumption), deadline guarantee, and high energy savings simultaneously. We model the problem to simultaneously achieve these three objectives using integer programming. Due to the NP-hardness of the problem, we propose a heuristic Cooperative job Scheduling (CSA) and data Allocation method. CSA novelly reverses the order of data allocation and job scheduling in the current computing framework. Job-scheduling-first enables CSA to proactively consolidate tasks with more common requested data to the same server when conducting deadline-aware scheduling, and also consolidate the tasks to as few servers as possible to maximize energy savings. This facilitates the subsequent data allocation step to allocate a data block to the server that hosts most of this data's requester tasks, thus maximally enhancing data locality. To achieve the tradeoff between data locality and energy savings with specified weights, CSA has a cooperative recursive refinement process that recursively adjusts the job schedule and data allocation schedule. We further propose two enhancement algorithms (i.e., minimum k-cut data reallocation algorithm and bipartite based task reassignment algorithm) to further improve the performance of CSA through additional data reallocation and task reassignment, respectively. Trace-driven experiments in the simulation and the real cluster show that CSA outperforms other schedulers in supplying deadline-guarantee and resource-efficient services and the effectiveness of each enhancement. Also, the enhancement algorithms are effective in improving CSA.
Entity linking is a central concern of automatic knowledge question answering and knowledge base population. Traditional collective entity linking approaches only consider one of the entity contexts or semantic relati...
详细信息
Entity linking is a central concern of automatic knowledge question answering and knowledge base population. Traditional collective entity linking approaches only consider one of the entity contexts or semantic relations between entities. Thus, these approaches always have poor performance on Web documents. The efficiency of collective entity linking needs to be improved as well. This paper proposes a collective entity linking algorithm based on topic model and graph. Constructing the topic model can represent mentions and candidate entities by using topic distributions. It makes full use of context in documents. Entity semantic relations are represented by document similarities which are computed through the topic model. parallel computing is used to reduce long running time which is caused by topic model construction. Entity graph is constructed according to the relations between entities in the knowledge graph. Hypertext-Induced Topic Search exploits the entity graph to compute hub value and authority value of candidate entities. And the authority value is the basis for entity linking. Experimental results on open-domain corpus (NLPCC2014) demonstrate the validity of the proposed method. Experimental results show that the proposed approach has 5.2% improvement in F-1-measure than AGDISTIS on corp NLPCC2014.
In optical networks, ensuring high quality of transmission (QoT) is essential to prevent degradation of optical signals, especially when the signal strength falls below a specified threshold. While machine learning (M...
详细信息
In optical networks, ensuring high quality of transmission (QoT) is essential to prevent degradation of optical signals, especially when the signal strength falls below a specified threshold. While machine learning (ML) is widely used for QoT prediction, predicting QoT accurately for large-scale optical links presents challenges. Traditional serial methods often result in high latency and decreased processing efficiency of optical channels. To solve this problem, this paper proposes a Dask-based P-FEDformer approach. Initially, a FEDformer-based predictor is constructed, and then QoT prediction for multiple channels is realized under the Dask parallel architecture. To enhance model prediction accuracy, wavelet decomposition technique is employed. Simulation results demonstrate the method's effectiveness in handling large amount of data with a 60% improvement in time efficiency compared to serial execution, while maintaining accurate QoT prediction.
Due to the sustained and rapid growth of big data and the demand on higher accuracy solutions for application problems, the completion time of fixed-time big data tasks executing on original parallel computing systems...
详细信息
Due to the sustained and rapid growth of big data and the demand on higher accuracy solutions for application problems, the completion time of fixed-time big data tasks executing on original parallel computing systems becomes longer and longer. To meet the requirement of fixed completion time, the original parallel computing systems need to be scaled accordingly. Therefore, this paper studies an iso-time scaling method to guide the scaling of parallel computing systems. Firstly, the models of big data parallel tasks and parallel computing systems are built, and an algorithm is designed to calculate the completion time of big data parallel tasks. Secondly, according to the actual situation of the current majority computing centers, we put forward some reasonable hypotheses, make full use of backup computational nodes, and optimize the cost of scaling parallel computing systems. Then, a vertical scaling algorithm is designed to upgrade computational nodes, and a horizontal scaling algorithm is designed to add computational nodes. Furthermore, this paper compares the two scaling algorithms in the aspects of time complexity, degree of parallelism and system utilization for scaled parallel computing system. Finally, some simulation experiments are conducted. The experimental results show that our method can keep the completion time within fixed time when the increasing data parallel tasks execute on the scaled parallel computing systems and it has better effect in scaling cost than traditional methods.
Scalability is an important performance metric of parallel computing, but the traditional scalability metrics only try to reflect the scalability for parallel computing from one side, which makes it difficult to fully...
详细信息
Scalability is an important performance metric of parallel computing, but the traditional scalability metrics only try to reflect the scalability for parallel computing from one side, which makes it difficult to fully measure its overall performance. This paper studies scalability metrics intensively and completely. From lots of performance parameters of parallel computing, a group of key ones is chosen and normalized. Further the area of Kiviat graph is used to characterize the overall performance of parallel computing. Thereby a novel scalability metric about iso-area of performance for parallel computing is proposed and the relationship between the new metric and the traditional ones is analyzed. Finally the novel metric is applied to address the scalability of the matrix multiplication Cannon's algorithm under LogP model. The proposed metric is significant to improve parallel computing architecture and to tune parallel algorithm design.
Purpose The purpose of this study is to develop a new parallel metaheuristic algorithm for solving unconstrained continuous optimization problems. Design/methodology/approach The proposed method brings several metaheu...
详细信息
Purpose The purpose of this study is to develop a new parallel metaheuristic algorithm for solving unconstrained continuous optimization problems. Design/methodology/approach The proposed method brings several metaheuristic algorithms together to form a coalition under Weighted Superposition Attraction-Repulsion Algorithm (WSAR) in a parallel computing environment. The proposed approach runs different single solution based metaheuristic algorithms in parallel and employs WSAR (which is a recently developed and proposed swarm intelligence based optimizer) as controller. Findings The proposed approach is tested against the latest well-known unconstrained continuous optimization problems (CEC2020). The obtained results are compared with some other optimization algorithms. The results of the comparison prove the efficiency of the proposed method. Originality/value This study aims to combine different metaheuristic algorithms in order to provide a satisfactory performance on solving the optimization problems by benefiting their diverse characteristics. In addition, the run time is shortened by parallel execution. The proposed approach can be applied to any type of optimization problems by its problem-independent structure.
Evacuation simulation has the potential to be used as part of a decision support system during large-scale incidents to provide advice to incident commanders. To be viable in these applications, it is essential that t...
详细信息
Evacuation simulation has the potential to be used as part of a decision support system during large-scale incidents to provide advice to incident commanders. To be viable in these applications, it is essential that the simulation can run many times faster than real time. parallel processing is a method of reducing run times for very large computational simulations by distributing the workload amongst a number of processors. This paper presents the development of a parallel version of the rule based evacuation simulation software buildingEXODUS using domain decomposition. Four Case Studies (CS) were tested using a cluster, consisting of 10 Intel Core 2 Duo (dual core) 3.16 GHz CPUs. CS-1 involved an idealised large geometry, with 20 exits, intended to illustrate the peak computational speed up performance of the parallel implementation, the population consisted of 100,000 agents;the peak computational speedup (PCS) was 14.6 and the peak real-time speedup (PRTS) was 4.0. CS-2 was a long area with a single exit area with a population of 100,000 agents;the PCS was 13.2 and the PRTS was 17.2. CS-3 was a 50 storey high rise building with a population of 8000/16,000 agents;the PCS was 2.48/4.49 and the PRTS was 17.9/12.9. CS-4 is a large realistic urban area with 60,000/120,000 agents;the PCS was 5.3/6.89 and the PRTS was 5.31/3.0. This type of computational performance opens evacuation simulation to a range of new innovative application areas such as real-time incident support, dynamic signage in smart buildings and virtual training environments.
暂无评论