With the emergence of RGB-D cameras (e.g., Kinect), the sensing capability of artificial intelligence systems has been dramatically increased, and as a consequence, a wide range of depth image-based human-machine interaction applica-tions are proposed. In design industry, a 3D model always contains abundant information, which are required for man-ufacture. Since depth images can be conveniently acquired, a retrieval system that can return 3D models based on depth image inputs can assist or improve the traditional product design process. In this work, we address the depth image-based 3D model retrieval problem. By extending the neural network to a neural network pair with identical output lay-ers for objects of the same category, unified domain-invariant representations can be learned based on the low-level mis-matched depth image features and 3D model features. A unique advantage of the framework is that the correspon-dence information between depth images and 3D models are not required, so that it can easily be generalized to large-scale databases. In order to evaluate the effiectiveness of our approach, depth images (with Kinect-Type noise) in the NYU Depth V2 dataset are used as queries to retrieve 3D models of the same categories in the SHREC 2014 dataset. Experimental results suggest that our approach can outper-form the state-of-The-Arts methods, and the paradigm that directly uses the original representations of depth images and 3D models for retrieval.