In this paper, we propose a parallel implementation of the number-theoretic transform (NTT) on GPU clusters. the butterfly operation of the NTT can be performed using modular addition, subtraction, and multiplica...
详细信息
this study proposes integrating Genetic algorithms (GAs) into control systems to enhance autonomy, particularly for unmanned aerial vehicle (UAV) operations. Traditional control systems, which rely on expert knowledge...
详细信息
ISBN:
(纸本)9798331518509;9798331518493
this study proposes integrating Genetic algorithms (GAs) into control systems to enhance autonomy, particularly for unmanned aerial vehicle (UAV) operations. Traditional control systems, which rely on expert knowledge and complex mathematical calculations, limit autonomy. In contrast, GAs offer robust global search capabilities, helping to avoid local optima and enhancing computational efficiency through parallelprocessing. Utilizing a modified Nonlinear Auto-Regressive eXogenous (NARX) model with feedback regulation ensures system stability and accurate tracking of target values, allowing the system to learn dynamic relationships essential for control in complex nonlinear conditions. We introduce a new GA-NARX based autonomous UAV control system designed for exploration in unfamiliar environments. Our enhanced system features a self-optimizing control mechanism that enables global optimization for peak performance. this advanced control system minimizes human-machine interaction by leveraging GAs' predictive abilities to anticipate future states while significantly improving the control precision. Overall, the design of this autonomous control system aims to optimize coordination and control strategies for UAV swarms, offering innovative solutions for efficient flight patterns.
Recent advancements in medical image segmentation have been significantly propelled by deep learning methodologies, markedly enhancing precision and computational efficacy. this progress is particularly pronounced in ...
详细信息
Web usage mining is a critical stage in analyzing user behavior that involves mining frequently visited pages. Mining usage patterns from a big weblog file using existing frequent mining techniques are inefficient in ...
详细信息
the Modular Unified Space Technology Avionics for Next Generation (MUSTANG) is a small integrated Avionics system including Command and Data Handling (C&DH), Power System Electronics (PSE), Attitude Control System...
详细信息
ISBN:
(纸本)9798350341379
the Modular Unified Space Technology Avionics for Next Generation (MUSTANG) is a small integrated Avionics system including Command and Data Handling (C&DH), Power System Electronics (PSE), Attitude Control System Interfaces (ACS), and Propulsion Electronics. the MUSTANG Avionics Architecture is built upon many years of knowledge capture and lessons learned at the Goddard Space Flight Center. With a motivation towards modularity and keeping board redesign costs to a minimum, MUSTANG offers flexibility in features with a backplane-less design and allows the user to choose the options (cards) needed for their system. It incorporates a distributed power system that provides secondary power to all its subcomponents reducing the number of primary services needed for an Avionics. MUSTANG can be integrated into one system or divided into several smaller components. MUSTANG supports redundancy and cross-strap ability for a more robust and reliable Avionics system. A variation of MUSTANG exists for Instrument Electronics called iMUSTANG and allows the user to select functionality applicable to the instrument electronics. MUSTANG is not meant to replace Avionics for all spacecraft. there are limitations due to its relatively compact size, but the MUSTANG design has proven broadly applicable on many spacecraft and instrument bus avionics architectures.
parallel-in-time algorithms provide an additional layer of concurrency for the numerical integration of models based on time-dependent differential equations. Methods like Parareal, which parallelize across multiple t...
详细信息
ISBN:
(纸本)9783031396977;9783031396984
parallel-in-time algorithms provide an additional layer of concurrency for the numerical integration of models based on time-dependent differential equations. Methods like Parareal, which parallelize across multiple time steps, rely on a computationally cheap and coarse integrator to propagate information forward in time, while a parallelizable expensive fine propagator provides accuracy. Typically, the coarse method is a numerical integrator using lower resolution, reduced order or a simplified model. Our paper proposes to use a physics-informed neural network (PINN) instead. We demonstrate for the Black-Scholes equation, a partial differential equation from computational finance, that Parareal with a PINN coarse propagator provides better speedup than a numerical coarse propagator. Training and evaluating a neural network are both tasks whose computing patterns are well suited for GPUs. By contrast, mesh-based algorithms withtheir low computational intensity struggle to perform well. We show that moving the coarse propagator PINN to a GPU while running the numerical fine propagator on the CPU further improves Parareal's single-node performance. this suggests that integrating machine learning techniques into parallel-in-time integration methods and exploiting their differences in computing patterns might offer a way to better utilize heterogeneous architectures.
the continuous demand for higher computational performance and the stagnating developments in the general purpose processor landscape have led to a surge in interest for highly specialized and efficient hardware. Comb...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
the continuous demand for higher computational performance and the stagnating developments in the general purpose processor landscape have led to a surge in interest for highly specialized and efficient hardware. Combined withthe rising popularity of parameterizable hardware, a new opportunity to optimize these architectures for particular workloads arises, largely driven by the RISC-V Instruction Set Architecture (ISA). this work present an application-specific optimization methodology for general purpose processors, enabling the development of architectures which are faster and more efficient for their designated workloads. Driven by the Cache-Aware Roofline Model (CARM) insights, the methodology guides the configuration of the memory and computational subsystems of the processor. We apply this methodology to two applications, demonstrating up to a 2.67x performance increase and a 1.34x improvement to energy efficiency.
In this paper, we consider a fully implicit Stokes solver implementation targeting both GPU and multithreaded CPU architectures. the solver is aimed at the semistructured mesh often emerging during permeability calcul...
详细信息
ISBN:
(纸本)9783031388637
In this paper, we consider a fully implicit Stokes solver implementation targeting both GPU and multithreaded CPU architectures. the solver is aimed at the semistructured mesh often emerging during permeability calculations in geology. the solver basically consists of four main parts: geometry and topology analysis, linear system construction, linear system solution, and postprocessing. A modified version of the AMGCL library developed by the authors in earlier research is used for the solution. Previous experiments showed that the GPU architecture can deliver extremely high performance for such types of problems, especially when the whole stack is implemented on the GPU. However, the GPU memory limitation significantly reduces the available mesh sizes. For some applications, the computation time is not as important as the mesh size. therefore, it is convenient to have both GPU (for example, CUDA) and multithreaded CPU versions of the same code. the direct code port is time-consuming and error-prone. Several automatic approaches are available: OpenACC standard, DVM-system, SYCL, and others. Often, however, these approaches still demand careful programming if one wants to deliver maximum performance for a specific architecture. Some problems (such as the analysis of connected components, in our case) require totally different optimal algorithms for different architectures. Furthermore, sometimes native libraries deliver the best performance and are preferable for specific parts of the solution. For these reasons, we used another approach, based on C++ language abilities as template programming. the main two components of our approach are array classes and ‘for each’ algorithms. Arrays can be used on both CPU and CUDA architectures and internally substitute the memory layout that best fits the current architecture (as an ‘array of structures’ or ‘structure of arrays’). ‘For each’ algorithms generate kernels or parallel cycles that implement parallelprocessing for ind
Memory dependence prediction is essential in superscalar out-of-order processors as it improves instruction-level parallelism efficiency by predicting dependencies between memory access instructions. However, existing...
详细信息
this paper presents PPQSort (Pattern parallel Quicksort), a new parallel quicksort algorithm that provides high performance and ease of use. PPQSort uses C++ threads for parallelization, achieving efficient sorting wi...
详细信息
暂无评论