TY - GEN
T1 - CuriousRL
T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024
AU - Bohara, Sushil
AU - Hanif, Muhammad Abdullah
AU - Shafique, Muhammad
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Though Proximal Policy Optimization (PPO) has emerged as a dominant algorithm for quadruped locomotion due to its stability and ease of implementation, its learning efficiency is affected by a limited exploration ability of the algorithm. We combine PPO with the Intrinsic Curiosity Module (ICM) to form CuriousRL, which enhances the exploration aspect of PPO, making the quadruped locomotion autonomous and adaptive in dynamic environments. ICM provides intrinsic rewards to the robot in addition to the external environmental rewards from PPO, fostering exploration. We use CuriousRL to teach quadruped robots learn to walk themselves autonomously. We simulate the experiments in Isaac Gym using the ANYmal quadrupeds and measure the performances in dynamic test environments with obstacles and uneven terrains using various environment sensor data including positions, velocities, forces, and torques in the legs and joints. We illustrate that CuriousRL performs better in terms of exploring effective policies and avoiding risk-averse stationary policy adaptation and ehancing sample efficiency.
AB - Though Proximal Policy Optimization (PPO) has emerged as a dominant algorithm for quadruped locomotion due to its stability and ease of implementation, its learning efficiency is affected by a limited exploration ability of the algorithm. We combine PPO with the Intrinsic Curiosity Module (ICM) to form CuriousRL, which enhances the exploration aspect of PPO, making the quadruped locomotion autonomous and adaptive in dynamic environments. ICM provides intrinsic rewards to the robot in addition to the external environmental rewards from PPO, fostering exploration. We use CuriousRL to teach quadruped robots learn to walk themselves autonomously. We simulate the experiments in Isaac Gym using the ANYmal quadrupeds and measure the performances in dynamic test environments with obstacles and uneven terrains using various environment sensor data including positions, velocities, forces, and torques in the legs and joints. We illustrate that CuriousRL performs better in terms of exploring effective policies and avoiding risk-averse stationary policy adaptation and ehancing sample efficiency.
KW - CuriousRL
KW - Exploration
KW - Intrinsic Rewards
KW - Proximal Policy Optimization
KW - Quadrupeds
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=85205012863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205012863&partnerID=8YFLogxK
U2 - 10.1109/IJCNN60899.2024.10650715
DO - 10.1109/IJCNN60899.2024.10650715
M3 - Conference contribution
AN - SCOPUS:85205012863
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2024 through 5 July 2024
ER -