The complex interconnections between various critical infrastructure sectors make the system of systems (SoS) vulnerable to failures and highlight the importance of robustness and resilience. To this end, we first establish holistic probabilistic networks to model the interdependencies between infrastructure components. To capture the underlying failure and recovery dynamics, we further propose a Markov decision processes (MDP) model in which the response policy determines a long-term performance. To address the challenge of a large dimensionality, we exploit the sparsity of the network interconnections and solve an approximate linear program by the variable elimination, which leads to a distributed control policy under mild assumptions. Finally, we use a case study of the interdependent power and subway systems to corroborate the results and show that the optimal resilience resource planning and allocation can reduce the failure probability and mitigate the impact of failures caused by natural or artificial disasters.