Stochastic policy-based deep reinforcement learning(DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training *** the agent...
详细信息
Stochastic policy-based deep reinforcement learning(DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training *** the agent is exposed to more complex environment,not only is the methodology inefficient,but its performance may also suffer from the issue of high *** paper develops a framework to accelerate the training procedure and reduce the variance by introducing a stochastic switching network,which specifically allows the agent to choose between heuristic actions and actions output by proximal policy optimization(PPO) *** of starting from the random actions,the agent can be effectively guided by the heuristic actions so that the navigation capability of the agent can be rapidly *** vanilla policy gradient(VPG) algorithm is further utilized to train the switching network,which can be jointly trained with the baseline *** the experimental comparison with the baseline PPO in the customized maze environment with openAI Gym toolkit,our method greatly contributes to the more efficient execution of navigation task by means of the heuristic actions for guidance.
Most of the management systems used in common enclosed spaces only have basic functions such as booking and payment, and do not have epidemic prevention and traceability functions. In the COVID-19 pandemic, enclosed s...
Most of the management systems used in common enclosed spaces only have basic functions such as booking and payment, and do not have epidemic prevention and traceability functions. In the COVID-19 pandemic, enclosed scenes, without an epidemic prevention system, clearly increase the risk of cross-contamination of the population, making it a black box that will amplify the severity of the outbreak. In this paper, we develop an intelligent recommendation system designed to coordinate the operation of enclosed spaces and decentralize the new customers to a low-infection risk box. An adaptive genetic algorithm is proposed to achieve optimal allocation of personnel and boxes, which can avoid contact between customers during low peak hours and minimize cross-contact during peak hours. On the one hand, it guarantees the user experience, and on the other hand, it guarantees these enclosed spaces have a high decentralization of crowd density when an epidemic occurs. This greatly reduces the risk of exposure to infection and is of great significance in preventing the spread of the epidemic.
For the containment control problem of autonomous surface vehicles with external disturbances, a novel non-singular fixed-time control scheme is developed, where the multi-ship system consists of real leaders and foll...
详细信息
Numerous studies have demonstrated that numerous animal species are capable of goal-directed navigation using environmental information for dead reckoning. The stable magnetic field of the earth provides important inf...
详细信息
In order to solve the problem that it is difficult to detect the weak sinusoidal response signal when the frequency response method is used to detect the transformer winding deformation fault online, this paper propos...
详细信息
In order to solve the problem that it is difficult to detect the weak sinusoidal response signal when the frequency response method is used to detect the transformer winding deformation fault online, this paper proposes a method to detect the amplitude of the response signal by using Duffing oscillator. Secondly, in order to solve the influence of the target signal phase shift on the amplitude detection in the actual detection process, a method of first detecting the signal phase using a vibrator array composed of Duffing vibrators and then detecting the amplitude is proposed. Finally, using this method to process the two sets of measured response signals of online detection, the detection results show that the maximum error between the detected signal amplitude and the standard signal is 5. 69%, which meets the requirements of the power industry standard, which proves that the method used in this article is It is feasible in winding deformation detection.
This paper studies the linear-quadratic-Gaussian (LQG) problem for sampled-data systems with a stochastic sampling interval obeying a certain probability distribution. An optimal estimator of the system state is prese...
This paper studies the linear-quadratic-Gaussian (LQG) problem for sampled-data systems with a stochastic sampling interval obeying a certain probability distribution. An optimal estimator of the system state is presented by the standard Kalman filter , and the Vandermonde matrix and Kronecker product operation are used to calculate the mathematical expectation caused by stochastic sampling in the process of designing the LQG controller. Moreover, it was proved that the controller can ensure the system is exponentially mean square stable. Finally, some simulation results are given to verify the effectiveness and practicability of the proposed controller design method.
In this paper, a distributed MPC algorithm is proposed for linear discrete systems with bounded communication noise and random communication failure. Based on the characteristics of time-varying directed graphs, the c...
详细信息
In this paper, a distributed MPC algorithm is proposed for linear discrete systems with bounded communication noise and random communication failure. Based on the characteristics of time-varying directed graphs, the convergence of the algorithm is analyzed in the case of communication noise and random communication failure. This algorithm uses the distributed projection dual subgradient algorithm to solve the dual problem and replaces the double stochastic condition of the traditional algorithm with row stochastic condition. The exponential stability of the system is guaranteed. Finally, an example is given to illustrate the performance of the algorithm.
This paper establishes the non-Fourier heat conduction model to describe the heat transfer process of mono-crystalline silicon under the condition of unstable thermal field and thermal shock in the Czochralski method....
详细信息
This paper establishes the non-Fourier heat conduction model to describe the heat transfer process of mono-crystalline silicon under the condition of unstable thermal field and thermal shock in the Czochralski method.A novel differential equations solver called Physics-Informed Neural Networks(PINN) algorithm was *** with finite element method(FEM),this method has some advantages like no grid requirement and easily *** order to deal with the unbalance of constraint condition and speed up the convergence,we propose a novel method called Self-Adaptive Weight Physics-Informed Neural Networks(SWPINN).The comparison of the experimental results of SWPINN and COMSOL verifies the effectiveness of *** modifying the parameter of non-Fourier heat conduction model,this paper obtains the temperature distribution under different heat relaxation ***,comparison between SWPINN and PINN shows that the proposed method has faster convergence speed and higher accuracy.
The parasitic inductance of transmitting coil in transient electromagnetic transmitter causes some serious problems like the overlong falling edge time, overshoot and oscillation of the emission current. In order to s...
详细信息
The parasitic inductance of transmitting coil in transient electromagnetic transmitter causes some serious problems like the overlong falling edge time, overshoot and oscillation of the emission current. In order to solve these above problems, this paper proposes a design scheme of passive constant voltage clamping circuit of highly speeding shutoff based on TVS(Transient Voltage Suppressor) and switch. In this design scheme, resistance, TVS and switch are configured in parallel and then connected into the main launch bridge circuit. During the earlier stage of the shutoff of the emission current, TVS is utilized to form the high voltage clamping and thus realize the highly speeding shutoff of the emission current. At the later stage of the emission current decline, the energy of the load inductance is set free by the resistance to prevent the overshoot and oscillation of the emission current. In this paper, the operating procedure and principle of the circuit, the influence of the parameters of core devices on the circuit performance, and the effectiveness of this design scheme is verified by simulation and experiment. The results of simulation and experiment show that this circuit is effective to reduce the shutoff time of the emission current, restrain the overshoot and oscillation meanwhile, and hence improve the wave quality of the emission current.
暂无评论