Sensor fusion represents a robust approach to ecologically valid obstacle identification in building a comprehensive electronic travel aid (ETA) for the blind and visually impaired. A stereoscopic camera system and an infrared sensor with 16 independent elements is proposed to be combined with a multi-scale convolutional neural network for this fusion framework. While object detection and identification can be combined with depth information from a stereo camera system, our experiments demonstrate that depth information may be inconsistent given material surfaces of specific potential collision hazards. This inconsistency can be easily remedied by supplementation with a more reliable depth signal from an alternate sensing modality. The sensing redundancy in this multi-modal strategy, as deployed in this platform, may enhance the situational awareness of a visually impaired end user, permitting more efficient and safer obstacle negotiation.