Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system

Anh Vu Vo, Chamin Nalinda Lokugam Hewage, Gianmarco Russo, Neel Chauhan, Debra F. Laefer, Michela Bertolotto, Nhien An Le-Khac, Ulrich Oftendinger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a novel LiDAR point cloud data encoding solution that is compact, flexible, and fully supports distributed data storage within the Hadoop distributed computing environment. The proposed data encoding solution is developed based on Sequence File and Google Protocol Buffers. Sequence File is a generic splittable binary file format built in the Hadoop framework for storage of arbitrary binary data. The key challenge in adopting the Sequence File format for LiDAR data is in the strategy for effectively encoding the LiDAR data as binary sequences in a way that the data can be represented compactly, while allowing necessary mutation. For that purpose, a data encoding solution, based on Google Protocol Buffers (a language-neutral, cross-platform, extensible data serialisation framework) was developed and evaluated. Since neither of the underlying technologies is sufficient to completely and efficiently represent all necessary point formats for distributed computing, an innovative fusion of them was required to provide a viable data storage solution. This paper presents the details of such a data encoding implementation and rigorously evaluates the efficiency of the proposed data encoding solution. Benchmarking was done against a straightforward, naive text encoding implementation using a high-density aerial LiDAR scan of a portion of Dublin, Ireland. The results demonstrated a 6-times reduction in data volume, a 4-times reduction in database ingestion time, and up to a 5 times reduction in querying time.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5644-5653
Number of pages10
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period12/9/1912/12/19

Keywords

  • Big Data
  • Google Protocol Buffers
  • HBase
  • Hadoop
  • LiDAR
  • data encoding
  • distributed computing
  • distributed database
  • point cloud
  • spatial database

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system'. Together they form a unique fingerprint.

Cite this