Abstract
This study investigates the inability of two popular data splitting techniques: train/test split and k-fold cross-validation that are to create training and validation data sets, and to achieve sufficient generality for supervised deep learning (DL) methods. This failure is mainly caused by their limited ability of new data creation. In response, the bootstrap is a computer based statistical resampling method that has been used efficiently for estimating the distribution of a sample estimator and to assess a model without having knowledge about the population. This paper couples cross-validation and bootstrap to have their respective advantages in view of data generation strategy and to achieve better generalization of a DL model. This paper contributes by: (i) developing an algorithm for better selection of training and validation data sets, (ii) exploring the potential of bootstrap for drawing statistical inference on the necessary performance metrics (e.g., mean square error), and (iii) introducing a method that can assess and improve the efficiency of a DL model. The proposed method is applied for semantic segmentation and is demonstrated via a DL based classification algorithm, PointNet, through aerial laser scanning point cloud data.
Original language | English (US) |
---|---|
Pages (from-to) | 111-118 |
Number of pages | 8 |
Journal | International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives |
Volume | 48 |
Issue number | 4/W3-2022 |
DOIs | |
State | Published - Dec 2 2022 |
Event | 7th International Conference on Smart City Applications, SCA 2022 - Castelo Branco, Portugal Duration: Oct 19 2022 → Oct 21 2022 |
Keywords
- Classification
- Cross-Validation
- Neural Network
- PointNet
- Semantic Segmentation
- Supervised Machine Learning
ASJC Scopus subject areas
- Information Systems
- Geography, Planning and Development