TY - GEN
T1 - Optimizing Personalized Robot Actions with Ranking of Trajectories
AU - Huang, Hao
AU - Liu, Yiyun
AU - Yuan, Shuaihang
AU - Wen, Congcong
AU - Hao, Yu
AU - Fang, Yi
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Intelligent robots designed for real-world human interactions need to adapt to the diverse preferences of individuals. Preference-based Reinforcement Learning (PbRL) offers promising potential to teach robots personalized behaviors by learning through interactions with humans, eliminating the need for intricate, manually crafted reward functions. However, the current PbRL approaches are hampered by sub-optimal feedback efficiency and limited exploration within state and reward spaces, resulting in subpar performance in complex interactive tasks. To enhance the effectiveness of PbRL, we integrate prior task knowledge into the PbRL framework. Subsequently, we develop a reward model based on ranking a set of multiple robot trajectories. This acquired reward is then utilized to refine the robot’s policy, ensuring alignment with human preferences. To validate our method, we showcase its versatility in different human-robot assistive tasks. The experimental results demonstrate that our approach offers a useful, effective, and broadly applicable solution for personalized human-robot interaction.
AB - Intelligent robots designed for real-world human interactions need to adapt to the diverse preferences of individuals. Preference-based Reinforcement Learning (PbRL) offers promising potential to teach robots personalized behaviors by learning through interactions with humans, eliminating the need for intricate, manually crafted reward functions. However, the current PbRL approaches are hampered by sub-optimal feedback efficiency and limited exploration within state and reward spaces, resulting in subpar performance in complex interactive tasks. To enhance the effectiveness of PbRL, we integrate prior task knowledge into the PbRL framework. Subsequently, we develop a reward model based on ranking a set of multiple robot trajectories. This acquired reward is then utilized to refine the robot’s policy, ensuring alignment with human preferences. To validate our method, we showcase its versatility in different human-robot assistive tasks. The experimental results demonstrate that our approach offers a useful, effective, and broadly applicable solution for personalized human-robot interaction.
KW - Assistive Gym
KW - Human-robot interaction
KW - Multiple trajectory ranking
KW - Preference-based reinforcement learning (PbRL)
UR - http://www.scopus.com/inward/record.url?scp=85211952043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85211952043&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78110-0_1
DO - 10.1007/978-3-031-78110-0_1
M3 - Conference contribution
AN - SCOPUS:85211952043
SN - 9783031781094
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 1
EP - 16
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -