TY - GEN
T1 - Composable planning with attributes
AU - Zhang, Amy
AU - Lerer, Adam
AU - Sukhbaatar, Sainbayar
AU - Fergus, Rob
AU - Szlam, Arthur
N1 - Publisher Copyright:
© Copyright 2018 by the author(s). All rights reserved.
PY - 2018
Y1 - 2018
N2 - The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft® that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.
AB - The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft® that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.
UR - http://www.scopus.com/inward/record.url?scp=85057257139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057257139&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057257139
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 9292
EP - 9307
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Krause, Andreas
A2 - Dy, Jennifer
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -