The efficiency and performance of neural network (NN) controllers present a significant challenge in the rapidly evolving landscape of real-time closed-loop control systems, such as those used in solar inverters. This...
详细信息
ISBN:
(纸本)9798331540913;9798331540906
The efficiency and performance of neural network (NN) controllers present a significant challenge in the rapidly evolving landscape of real-time closed-loop control systems, such as those used in solar inverters. This paper introduces a novel approach that enhances training efficiency by combining adaptive dropout with parallel computing techniques, utilizing the Levenberg-Marquardt (LM) algorithm and Forward Accumulation Through Time (FATT). Unlike traditional dropout methods that apply a fixed dropout rate uniformly across all neurons, Adaptive Dropout dynamically adjusts the dropout rate based on each neuron's calculated importance and its stage in the training process. This allows for the protection of more critical neurons while regularizing less significant ones, thereby improving convergence speed and enhancing generalization in the neural network controller. To further accelerate the training process, the Adaptive Dropout method is seamlessly integrated into a parallel computing architecture. This architecture employs multiple cores to compute Dynamic Programming (DP) costs and Jacobian matrices for various trajectories simultaneously. This approach not only harnesses the computational power of modern multi-core systems but also ensures efficient processing across all trajectories. The experimental results demonstrate that adaptive dropout with parallel computing provides improvements in training efficiency and overall performance than both no dropout and weight dropout control schemes.
High-precision static analysis can effectively detect Null Pointer Dereference (NPD) vulnerabilities in C language, but the performance overhead is significant. In recent years, researchers have attempted to enhance t...
详细信息
ISBN:
(纸本)9798400707056
High-precision static analysis can effectively detect Null Pointer Dereference (NPD) vulnerabilities in C language, but the performance overhead is significant. In recent years, researchers have attempted to enhance the efficiency of static analysis by leveraging multicore resources. However, due to complex dependencies in the analysis process, the parallelization of static value-flow NPD analysis for large-scale software still faces significant challenges. It is difficult to achieve a good balance between detection efficiency and accuracy, which impacts its *** paper presents PANDA, the first parallel detector for high-precision static value-flow NPD analyzer in the C language. The core idea of PANDA is to utilize dependency analysis to ensure high precision while decoupling the strong dependencies between static value-flow analysis steps. This transforms the traditionally challenging-to-parallelize NPD analysis into two parallelizable algorithms: function summarization and combined query-based vulnerability analysis. PANDA introduces a task-level parallel framework and enhances it with a dynamic scheduling method to parallel schedule the above two key steps, significantly improving the performance and scalability of memory vulnerability *** implemented within the LLVM framework (version 15.0.7), PANDA demonstrates a significant advantage in balancing accuracy and efficiency compared to current popular open-source detection tools. In precision-targeted benchmark tests, PANDA maintains a false positive rate within 3.17% and a false negative rate within 5.16%;in historical CVE detection rate tests, its recall rate far exceeds that of comparative open-source tools. In performance evaluations, compared to its serial version, PANDA achieves up to an 11.23-fold speedup on a 16-node server, exhibiting outstanding scalability.
This paper considers a model for classifying high school students by digital traces obtained from the VKontakte social network. The classification is based on the belonging of social network users to communities, the ...
详细信息
ISBN:
(纸本)9783030941413;9783030941406
This paper considers a model for classifying high school students by digital traces obtained from the VKontakte social network. The classification is based on the belonging of social network users to communities, the number of which is about hundreds of thousands, which leads to the emergence of big data in the process of analysis. The problem of working with big data is solved by parallelizing computations. The classification model was developed with the aim of recovering information from digital traces of users of social networks. On the basis of the trained model, the identification of users of the VKontakte social networkwas carried out by place of residence (village or city of theAltai Territory) and age (9 or 11 grade) among teenagerswith incomplete information on the grade and place of study in the digital traces. The best prediction accuracy for the trained model was of the order of 0.9. In the future, it is planned to build an extended classification model by including in the data sample of users of social networks of other age groups and to develop a support system for making managerial decisions for the university's admissions campaign.
parallel computing environments, especially those using multicore processors, often suffer significant performance degradation due to false sharing, which occurs when threads on different cores accidentally compete fo...
详细信息
ISBN:
(纸本)9798350373141;9798350373158
parallel computing environments, especially those using multicore processors, often suffer significant performance degradation due to false sharing, which occurs when threads on different cores accidentally compete for the same cache line. To tackle this challenge, this paper presents ParaShareDetect, a novel approach that utilizes a dynamic instrumentation mechanism alongside sophisticated runtime analysis, enabling the precise identification of false sharing instances. By integrating with the LLVM framework, a thorough evaluation of ParaShareDetect has been conducted, proving its effectiveness in accurately detecting false sharing scenarios across various benchmarks, including those from the highly esteemed Parsec and Phoenix benchmark suites. Moreover, ParaShareDetect has successfully identified false sharing issues that were not detected by other leading-edge tools, as illustrated by its findings in the bodytrack benchmark from Parsec. The evaluation further reveals that the methodology imposes an average performance overhead of approximately 3.86 times the original execution time, which is deemed acceptable in pre-production testing phases focused on software optimization. These results underscore the potential of ParaShareDetect to significantly improve the performance of multicore applications by addressing false sharing, all while maintaining a manageable level of overhead.
The wireless sensor network is a hot and significant research area nowadays and it can be addressed in almost every sector and environment. A few major challenges are countered in the wireless sensor network such as e...
详细信息
ISBN:
(纸本)9798350372977;9798350372984
The wireless sensor network is a hot and significant research area nowadays and it can be addressed in almost every sector and environment. A few major challenges are countered in the wireless sensor network such as energy consumption, battery lifetime, Attacks, Data Transmission, etc. Generally wireless sensor network produces non-Euclidian sensing data and metadata structures and it is very complex to deal with the structure, especially in order to measure anomalies and disruption in a network. In this paper, we have introduced parallel computing to resolve the heterogeneity of the sensing data and metadata as well. parallel computing has been applied implicitly for extracting only paramount data from the large scale of data to detect routing layer attacks. The convolution neural network (CNN) has been considered as a machine-learning model and we have enhanced the kernel to optimize the performance of the conventional CNN model to detect network layer attacks in terms of wireless sensor networks. Our investigated method demonstrates better results in detecting anomalies and attacks than the existing methods or techniques.
Kalman filtering is an algorithm widely used in state estimation of dynamic systems, and its wide applications include navigation, tracking and control systems. However, the computational complexity of Kalman filterin...
详细信息
ISBN:
(纸本)9798350390230;9798350390223
Kalman filtering is an algorithm widely used in state estimation of dynamic systems, and its wide applications include navigation, tracking and control systems. However, the computational complexity of Kalman filtering makes it inefficient when processing large-scale data. This paper studies the parallel processing method of Kalman filtering, and improves the execution efficiency of the algorithm by dividing the filtering process into blocks and parallelizing them. Through experiments on simulated data of sinusoidal signals plus Gaussian white noise, the significant advantages of parallel processing methods in computing time are verified, and the influence of block processing on filtering results is discussed.
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and s...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
As high-performance computing technologies advance, the significance of parallel programming in various domains is becoming increasingly evident since it allows us to harness the power of heterogeneous computing and solve complex problems more efficiently. However, for students to master this type of computation and be able to apply it in different contexts, it requires understanding how measuring and optimizing parallel code impacts its performance. This paper presents an approach to enhancing students' comprehension of parallel performance metrics through an interactive exercise that complements lectures on parallel performance and improves assessment.
Fuzzy integral in data mining is an excellent information fusion tool. It has obvious advantages in solving the combination of features and has more successful applications in classification problems. However, with th...
详细信息
Fuzzy integral in data mining is an excellent information fusion tool. It has obvious advantages in solving the combination of features and has more successful applications in classification problems. However, with the increase of the number of features, the time complexity and space complexity of fuzzy integral will also increase exponentially. This problem limits the development of fuzzy integral. This article proposes a high-efficiency fuzzy integral-parallel and Sparse Frame Based Fuzzy Integral (PSFI) for reducing time complexity and space complexity in the calculation of fuzzy integrals, which is based on the distributed parallel computing framework-Spark combined with the concept of sparse storage Aiming at the efficiency problem of the Python language, Cython programming technology is introduced in the meanwhile. Our algorithm is packaged into an algorithm library to realize a more efficient PSFI. The experiments verified the impact of the number of parallel nodes on the performance of the algorithm, test the performance of PSFI in classification, and apply PSFI on regression problems and imbalanced big data classification. The results have shown that PSFI reduces the variable storage space requirements of datasets with aplenty of features by thousands of times with the increase of computing resources. Furthermore, it is proved that PSFI has higher prediction accuracy than the classic fuzzy integral running on a single processor.
Reconfigurable architecture has great potential in computation-intensive and memory-intensive applications due to its flexible information configuration. Aiming at the problem of low computing efficiency caused by the...
详细信息
ISBN:
(纸本)9789819708000;9789819708017
Reconfigurable architecture has great potential in computation-intensive and memory-intensive applications due to its flexible information configuration. Aiming at the problem of low computing efficiency caused by the inconsistency between different granularity data and the underlying hardware structure in applications such as communication baseband signal processing, a parallel computing method supporting multi-bit data is proposed, and a dynamic granularity configuration structure used this method is designed based on reconfigurable array processors. The structure divides the calculation granularity into 8 bits, 16 bits, and 32 bits, and realizes four functions: data-combination, data-splitting, parallel-addition, and parallel-multiplication. These features increase the parallelism and flexibility of array structures. The experimental results show that the speedup ratio can reach 1.5 within a certain error range, the running time is reduced by about 20%, and the code complexity is also significantly reduced. In addition, the maximum operating frequency of the dynamic configuration circuit is 133.5 MHz by FPGA comprehensive implementation, which can realize the dynamic configuration of different granularity data in the calculation and achieve parallel computing of multi-bit data.
During the past decades, High-Performance computing (HPC) has been widely used in various industries. In particular, the exponential growth of GPU (graphics processing unit) is a key technology that has helped promoti...
详细信息
During the past decades, High-Performance computing (HPC) has been widely used in various industries. In particular, the exponential growth of GPU (graphics processing unit) is a key technology that has helped promoting the development of artificial intelligence in real-world use cases. When we use GPU to accelerate parallel applications, its programmability, resource management, and scheduling are non-trivial jobs to obtain optimized performance. Therefore, how to effectively exploit GPU resources and improve program performance has been a hot research topic recently. Benchmark does not always provide a good picture of the performance and details of the parallel applications. The various kinds of hardware devices and the constantly updated parallel programs make the performance analysis and modeling even more difficult. In this dissertation, there are four main contributions. First, we conduct a study on the GPU analytical performance model, which aims to estimate the suitable number of threads per block for performance improvement. Second, a novel method to elevate the limitation of GPU is proposed. This method offers a new way for optimization on GPU performance at the block schedule level. Third, we propose two parallel computing abstract models, namely, the computational and programming models that represent various computing paradigms based on Flynn’s taxonomy and simplify the workload distribution characteristics. This framework provides a general way to create an analytical performance model. Finally, we validate our proposed abstract models and demonstrate their usefulness with real-world applications in AI (Artificial Intelligence) on a distributed GPU system. The analytical performance model for CNN (Convolutional Neural Network) application analyzes performance characteristics on multiple GPUs, enabling users to evaluate their techniques before running applications on targeted machines.
暂无评论