TY - GEN
T1 - Scaling, Control and Generalization in Reinforcement Learning Level Generators
AU - Earle, Sam
AU - Jiang, Zehua
AU - Togelius, Julian
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level's quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen 'pinpoints' of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.
AB - Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level's quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen 'pinpoints' of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.
KW - procedural content generation
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85203550055&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203550055&partnerID=8YFLogxK
U2 - 10.1109/CoG60054.2024.10645598
DO - 10.1109/CoG60054.2024.10645598
M3 - Conference contribution
AN - SCOPUS:85203550055
T3 - IEEE Conference on Computatonal Intelligence and Games, CIG
BT - Proceedings of the 2024 IEEE Conference on Games, CoG 2024
PB - IEEE Computer Society
T2 - 6th Annual IEEE Conference on Games, CoG 2024
Y2 - 5 August 2024 through 8 August 2024
ER -