The so-called direct visual SLAM methods have shown a great potential in estimating a semidense or fully dense reconstruction of the scene, in contrast to the sparse reconstructions of the traditional feature-based algorithms. In this paper, we propose for the first time a direct, tightly-coupled formulation for the combination of visual and inertial data. Our algorithm runs in real-time on a standard CPU. The processing is split in three threads. The first thread runs at frame rate and estimates the camera motion by a joint non-linear optimization from visual and inertial data given a semidense map. The second one creates a semidense map of high-gradient areas only for camera tracking purposes. Finally, the third thread estimates a fully dense reconstruction of the scene at a lower frame rate. We have evaluated our algorithm in several real sequences with ground truth trajectory data, showing a state-of-the-art performance.