In this work, we consider an actuator redundant system, i.e., a system with more actuators than the number of effective control inputs, and bring together connections between control allocation, actuator selection, and learning. In this kind of systems, the actuator commands can be chosen to meet a given control objective while still having leftover degrees of freedom to use towards minimizing the overall actuation energy. We show that this energy can be further minimized by optimally selecting the actuators themselves, which we perform in two different scenarios; first, in the case where the control objective is not known beforehand; and second, in the case where the control objective is defined to be a stabilizing state feedback controller. To relax the requirement for knowledge of the system's plant matrix, we compose a novel learning mechanism based on policy iteration, which computes the anti-stabilizing solution to an associated algebraic Riccati equation using trajectory data. Simulations are performed that demonstrate our approach.