TY - GEN
T1 - Weakly Scene Segmentation Using Efficient Transformer
AU - Huang, Hao
AU - Yuan, Shuaihang
AU - Wen, Cong Cong
AU - Hao, Yu
AU - Fang, Yi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Current methods for large-scale point cloud scene semantic segmentation rely on manually annotated dense point-wise labels, which are costly, labor-intensive, and prone to errors. Consequently, gathering point cloud scenes with billions of labeled points is impractical in real-world scenarios. In this paper, we introduce a novel weak supervision approach to semantically segment large-scale indoor scenes, requiring only 1‰ of the points to be labeled. Specifically, we develop an efficient point neighbor Transformer to capture the geometry of local point cloud patches. To address the quadratic complexity of self-attention computation in Transformers, particularly for large-scale point clouds, we propose approximating the self-attention matrix using low-rank and sparse decomposition. Building on the point neighbor Transformer as foundational blocks, we design a Low-rank Sparse Transformer Network (LST-Net) for weakly supervised large-scale point cloud scene semantic segmentation. Experimental results on two commonly used indoor point cloud scene segmentation benchmarks demonstrate that our model achieves performance comparable to those of both weakly supervised and fully supervised methods. Our code can be found in https://github.com/hhuang-code/LST-Net.
AB - Current methods for large-scale point cloud scene semantic segmentation rely on manually annotated dense point-wise labels, which are costly, labor-intensive, and prone to errors. Consequently, gathering point cloud scenes with billions of labeled points is impractical in real-world scenarios. In this paper, we introduce a novel weak supervision approach to semantically segment large-scale indoor scenes, requiring only 1‰ of the points to be labeled. Specifically, we develop an efficient point neighbor Transformer to capture the geometry of local point cloud patches. To address the quadratic complexity of self-attention computation in Transformers, particularly for large-scale point clouds, we propose approximating the self-attention matrix using low-rank and sparse decomposition. Building on the point neighbor Transformer as foundational blocks, we design a Low-rank Sparse Transformer Network (LST-Net) for weakly supervised large-scale point cloud scene semantic segmentation. Experimental results on two commonly used indoor point cloud scene segmentation benchmarks demonstrate that our model achieves performance comparable to those of both weakly supervised and fully supervised methods. Our code can be found in https://github.com/hhuang-code/LST-Net.
UR - http://www.scopus.com/inward/record.url?scp=85216502085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85216502085&partnerID=8YFLogxK
U2 - 10.1109/IROS58592.2024.10802479
DO - 10.1109/IROS58592.2024.10802479
M3 - Conference contribution
AN - SCOPUS:85216502085
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 9784
EP - 9790
BT - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
Y2 - 14 October 2024 through 18 October 2024
ER -