TY - GEN
T1 - Training AI to Recognize Objects of Interest to the Blind and Low Vision Community
AU - Sankarnarayanan, Tharangini
AU - Paciorkowski, Lev
AU - Parikh, Khevna
AU - Hamilton-Fletcher, Giles
AU - Feng, Chen
AU - Sheng, Diwei
AU - Hudson, Todd E.
AU - Rizzo, John Ross
AU - Chan, Kevin C.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recent object detection models show promising advances in their architecture and performance, expanding potential applications for the benefit of persons with blindness or low vision (pBLV). However, object detection models are usually trained on generic data rather than datasets that focus on the needs of pBLV. Hence, for applications that locate objects of interest to pBLV, object detection models need to be trained specifically for this purpose. Informed by prior interviews, questionnaires, and Microsoft's ORBIT research, we identified thirty-five objects pertinent to pBLV. We employed this user-centric feedback to gather images of these objects from the Google Open Images V6 dataset. We subsequently trained a YOLOv5x model with this dataset to recognize these objects of interest. We demonstrate that the model can identify objects that previous generic models could not, such as those related to tasks of daily functioning - e.g., coffee mug, knife, fork, and glass. Crucially, we show that careful pruning of a dataset with severe class imbalances leads to a rapid, noticeable improvement in the overall performance of the model by two-fold, as measured using the mean average precision at the intersection over union thresholds from 0.5 to 0.95 (mAP50-95). Specifically, mAP50-95 improved from 0.14 to 0.36 on the seven least prevalent classes in the training dataset. Overall, we show that careful curation of training data can improve training speed and object detection outcomes. We show clear directions on effectively customizing training data to create models that focus on the desires and needs of pBLV.Clinical Relevance - This work demonstrated the benefits of developing assistive AI technology customized to individual users or the wider BLV community.
AB - Recent object detection models show promising advances in their architecture and performance, expanding potential applications for the benefit of persons with blindness or low vision (pBLV). However, object detection models are usually trained on generic data rather than datasets that focus on the needs of pBLV. Hence, for applications that locate objects of interest to pBLV, object detection models need to be trained specifically for this purpose. Informed by prior interviews, questionnaires, and Microsoft's ORBIT research, we identified thirty-five objects pertinent to pBLV. We employed this user-centric feedback to gather images of these objects from the Google Open Images V6 dataset. We subsequently trained a YOLOv5x model with this dataset to recognize these objects of interest. We demonstrate that the model can identify objects that previous generic models could not, such as those related to tasks of daily functioning - e.g., coffee mug, knife, fork, and glass. Crucially, we show that careful pruning of a dataset with severe class imbalances leads to a rapid, noticeable improvement in the overall performance of the model by two-fold, as measured using the mean average precision at the intersection over union thresholds from 0.5 to 0.95 (mAP50-95). Specifically, mAP50-95 improved from 0.14 to 0.36 on the seven least prevalent classes in the training dataset. Overall, we show that careful curation of training data can improve training speed and object detection outcomes. We show clear directions on effectively customizing training data to create models that focus on the desires and needs of pBLV.Clinical Relevance - This work demonstrated the benefits of developing assistive AI technology customized to individual users or the wider BLV community.
UR - http://www.scopus.com/inward/record.url?scp=85179646574&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179646574&partnerID=8YFLogxK
U2 - 10.1109/EMBC40787.2023.10340454
DO - 10.1109/EMBC40787.2023.10340454
M3 - Conference contribution
C2 - 38082714
AN - SCOPUS:85179646574
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
BT - 2023 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Conference, EMBC 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Conference, EMBC 2023
Y2 - 24 July 2023 through 27 July 2023
ER -