This paper addresses the problem of controlling a multirotor UAV with a cable-suspended load. In order to ensure the safe transportation of the load, the swinging motion, induced by the strongly coupled dynamics, has to be minimized. Specifically, using the Twin Delayed Deep Deterministic Policy Gradient (TD3) Reinforcement Learning algorithm, a policy Neural Network is trained in a model-free manner which navigates the vehicle to the desired waypoints while, simultaneously, compensating for the load oscillations. The learned policy network is incorporated into the cascaded control architecture of the autopilot by replacing the common PID position controller and, thus, communicating directly with the inner attitude one. The performance of the proposed policy is demonstrated through a comparative simulation and experimental study while using an octorotor UAV.