Proximal Policy Optimization (PPO) algorithms, including PPO-Clip and PPO-Penalty, use the Actor–Critic framework to realize continuous control state input and continuous action space exploration, avoiding the influence of discrete error on optimization results. Therefore, parameter tuning ability must be equipped when adopting this algorithm. Since the TD3 algorithm adds noise to the action output by an Actor network, it is easy to generate a large number of boundary actions in exploration. The convergence speed and robustness of the improved algorithm were superior to that of the DRL-based EMS. , a rule-based controller was embedded in the Twin Delayed Deep Deterministic Policy Gradient (TD3) loop to eliminate unreasonable torque distribution. Therefore, it depends heavily on the reward function and is not suitable for solving multi-objective optimization problems. Although the SAC algorithm has fast training speed and good exploration ability, it needs to scale the reward, which affects the Q value. Compared with DQN and rule-based methods, the proposed strategy had more advantages in control effect and convergence speed. , an optimal EMS based on the Soft Actor–Critic (SAC) algorithm was designed for electric vehicles with hybrid energy systems to minimize power consumption. Given these inherent problems, some more developed RL algorithms have been introduced in the energy management field. However, many hyper-parameters are used to explore the environment in the DDPG algorithm, resulting in slow convergence speed and unstable training. The proposed method had better fuel economy and robustness than rule-interposing Deep Q-Learning (DQL). , the DDPG algorithm was combined with the optimal braking-specific fuel consumption curves and the power battery charge-discharge characteristics. The results showed that the optimization performance of the proposed strategy was close to that of DP. , considering the traffic information and the number of passengers, a model-free DDPG with the Actor–Critic framework was adopted. , DDPG was used to solve optimal energy distribution issues in discrete-continuous mixed action space considering terrain information of driving routes. To further deal with the implementation of continuous action space, the Deep Deterministic Policy Gradient (DDPG) algorithm is introduced for the energy management of HEVs. Although MPC and ECMS have strong real-time performance, their control effect depends on the prediction accuracy of future driving conditions or the value of oil-electric conversion efficiency separately.Īlthough DQN realizes the transformation from the discrete control state to the continuous control state, its action space is still discrete. Instantaneous optimization strategies include model predictive control (MPC) and equivalent consumption minimum strategy (ECMS). Owing to these shortcomings, they are difficult to be applied as a real-time energy management controller. Nevertheless, the driving cycle and road information are required in advance and the computation complexity is high. These strategies can obtain optimal control results and great adaptability in different driving cycles. Global optimization strategies mainly include dynamic programming (DP), Pontryagin’s minimum principle (PMP) and convex optimization (CV) based EMSs. The optimization-based EMSs, including global optimization-based and instantaneous optimization-based EMSs, aim to reduce vehicle fuel consumption by minimizing the cost function. Furthermore, the adaptability and robustness of the proposed methods are confirmed in UDDS, WVUSUB and real driving cycle. Regarding dynamic programming (DP) as the benchmark, the PPO-based EMSs can achieve similar fuel economy and outstanding computation efficiency. Results indicate that the proposed strategies can obtain the minimum energy consumption, fastest computing speed, and lowest battery temperature in comparison with other RL-based EMSs. Compared with original PPO-based EMSs without considering battery thermal dynamics, simulation results demonstrate the effectiveness of the proposed strategies in battery thermal management. Since these three objectives are contradictory to each other, the optimal tradeoff between multiple objectives is realized by intelligently adjusting the weights in the training process. The proximal policy optimization (PPO) based multi-objective EMS considering the battery thermal characteristic is proposed for PHEB, aiming to improve vehicle energy saving performance while ensuring the battery State of Charge (SOC) and temperature within a rational range. As the performances of energy management strategy (EMS) are essential for a plug-in hybrid electric bus (PHEB) to operate in an efficient way.
0 Comments
Leave a Reply. |