TY - GEN
T1 - Spatial Visibility and Temporal Dynamics
T2 - 16th ACM Multimedia Systems Conference, MMSys 2025
AU - Li, Chen
AU - Zong, Tongyu
AU - Hu, Yueyu
AU - Wang, Yao
AU - Liu, Yong
N1 - Publisher Copyright:
© 2025 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/3/31
Y1 - 2025/3/31
N2 - Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points inside a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model (CellSight) that leverages the historical 3D visibility data and incorporates spatial perception, occlusion between points, and neighboring cell correlation to predict the cell visibility in the future. We focus on multi-second ahead prediction to enable the use of long pre-fetching buffers in on-demand streaming, critical for enhancing the robustness to network bandwidth fluctuations. CellSight significantly improves the long-term cell visibility prediction, reducing the prediction Mean Squared Error (MSE) loss by up to 50% compared to the state-of-the-art models when predicting 2 to 5 seconds ahead, while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.
AB - Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points inside a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model (CellSight) that leverages the historical 3D visibility data and incorporates spatial perception, occlusion between points, and neighboring cell correlation to predict the cell visibility in the future. We focus on multi-second ahead prediction to enable the use of long pre-fetching buffers in on-demand streaming, critical for enhancing the robustness to network bandwidth fluctuations. CellSight significantly improves the long-term cell visibility prediction, reducing the prediction Mean Squared Error (MSE) loss by up to 50% compared to the state-of-the-art models when predicting 2 to 5 seconds ahead, while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.
KW - 6 degree of freedom (DOF)
KW - GNN
KW - field of view (FOV) prediction
KW - immersive video streaming
KW - point cloud video
KW - virtual reality (VR)
UR - http://www.scopus.com/inward/record.url?scp=105005027745&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105005027745&partnerID=8YFLogxK
U2 - 10.1145/3712676.3714435
DO - 10.1145/3712676.3714435
M3 - Conference contribution
AN - SCOPUS:105005027745
T3 - MMSys 2025 - Proceedings of the 16th ACM Multimedia Systems Conference
SP - 24
EP - 34
BT - MMSys 2025 - Proceedings of the 16th ACM Multimedia Systems Conference
PB - Association for Computing Machinery, Inc
Y2 - 31 March 2025 through 3 April 2025
ER -