TY - GEN
T1 - Siamese CNN-BILSTM architecture for 3D shape representation learning
AU - Dai, Guoxian
AU - Xie, Jin
AU - Fang, Yi
N1 - Publisher Copyright:
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved.
PY - 2018
Y1 - 2018
N2 - Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.
AB - Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=85055687213&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055687213&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2018/93
DO - 10.24963/ijcai.2018/93
M3 - Conference contribution
AN - SCOPUS:85055687213
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 670
EP - 676
BT - Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
A2 - Lang, Jerome
PB - International Joint Conferences on Artificial Intelligence
T2 - 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
Y2 - 13 July 2018 through 19 July 2018
ER -