Abstract
This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA.
Original language | English (US) |
---|---|
State | Published - Jan 1 2013 |
Event | 1st International Conference on Learning Representations, ICLR 2013 - Scottsdale, United States Duration: May 2 2013 → May 4 2013 |
Conference
Conference | 1st International Conference on Learning Representations, ICLR 2013 |
---|---|
Country/Territory | United States |
City | Scottsdale |
Period | 5/2/13 → 5/4/13 |
ASJC Scopus subject areas
- Education
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics