Enhancing ground classification models for TBM tunneling: Detecting label errors in datasets

Saadeldin Mostafa, Rita L. Sousa

Research output: Contribution to journalArticlepeer-review


Tunnel Boring Machine (TBM) construction, particularly with closed-face TBMs, faces uncertainties due to the inability of the operator to directly observe the ground ahead. These uncertainties can lead to time delays, cost overruns, and accidents. While supervised machine learning techniques have been used to predict geology from TBM sensor data, their performance drops significantly when applied to other projects, indicating poor generalization. To ensure accurate results and improved generalization to future data, supervised learning models require high-quality, well-labeled data which is not usually the case for TBM datasets. This paper addresses the issue of “noisy” labels in TBM datasets, which human operators and engineers often label with varying interpretations. A data-centric framework was adapted and applied to an Earth Pressure Balance Machines (EPBM) tunnel dataset to detect and identify these mislabeled datapoints. The framework's outputs were validated using two techniques and apply several methods to clean the dataset. The best-performing method was selected for the test set. The paper concludes by discussing the limitations of the proposed method, the challenges encountered, and future research directions in this area.

Original languageEnglish (US)
Article number106301
JournalComputers and Geotechnics
StatePublished - Jun 2024


  • Data-centric geotechnics
  • Database
  • EPBM tunnel
  • Machine learning
  • Noisy labels

ASJC Scopus subject areas

  • Geotechnical Engineering and Engineering Geology
  • Computer Science Applications


Dive into the research topics of 'Enhancing ground classification models for TBM tunneling: Detecting label errors in datasets'. Together they form a unique fingerprint.

Cite this