TY - GEN
T1 - Democratizing MLLMs in Healthcare
T2 - 31st IEEE International Conference on Image Processing Challenges and Workshops, ICIPCW 2024
AU - Mir, Aya El
AU - Thadei Luoga, Lukelo
AU - Chen, Boyuan
AU - Hanif, Muhammad Abdullah
AU - Shafique, Muhammad
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deploying Multi-Modal Large Language Models (MLLMs) in healthcare is hindered by their high computational demands and significant memory requirements, which are particularly challenging for resource-constrained devices like the Nvidia Jetson Xavier. This problem is particularly evident in remote medical settings where advanced diagnostics are needed but resources are limited. In this paper, we introduce an optimization method for the general-purpose MLLM, TinyLLaVA, which we have adapted and renamed TinyLLaVA-Med. This adaptation involves instruction-Tuning and fine-Tuning TinyLLaVA on a medical dataset by drawing inspiration from the LLaVA-Med training pipeline. Our approach successfully minimizes computational complexity and power consumption, with TinyLLaVA-Med operating at 18.9W and using 11.9GB of memory, while achieving accuracies of 64.54% on VQA-RAD and 70.70% on SLAKE for closed-ended questions. Therefore, TinyLLaVA-Med achieves deployment viability in hardware-constrained environments with low computational resources, maintaining essential functionalities and delivering accuracies close to state-of-The-Art models.
AB - Deploying Multi-Modal Large Language Models (MLLMs) in healthcare is hindered by their high computational demands and significant memory requirements, which are particularly challenging for resource-constrained devices like the Nvidia Jetson Xavier. This problem is particularly evident in remote medical settings where advanced diagnostics are needed but resources are limited. In this paper, we introduce an optimization method for the general-purpose MLLM, TinyLLaVA, which we have adapted and renamed TinyLLaVA-Med. This adaptation involves instruction-Tuning and fine-Tuning TinyLLaVA on a medical dataset by drawing inspiration from the LLaVA-Med training pipeline. Our approach successfully minimizes computational complexity and power consumption, with TinyLLaVA-Med operating at 18.9W and using 11.9GB of memory, while achieving accuracies of 64.54% on VQA-RAD and 70.70% on SLAKE for closed-ended questions. Therefore, TinyLLaVA-Med achieves deployment viability in hardware-constrained environments with low computational resources, maintaining essential functionalities and delivering accuracies close to state-of-The-Art models.
KW - Embedded Systems
KW - Healthcare AI
KW - Medical Diagnostics
KW - Multimodal Large Language Models (MLLMs)
KW - Resource-Constrained ComputingUAE
UR - http://www.scopus.com/inward/record.url?scp=85214658864&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85214658864&partnerID=8YFLogxK
U2 - 10.1109/ICIPCW64161.2024.10769172
DO - 10.1109/ICIPCW64161.2024.10769172
M3 - Conference contribution
AN - SCOPUS:85214658864
T3 - 2024 IEEE International Conference on Image Processing Challenges and Workshops, ICIPCW 2024 - Proceedings
SP - 4164
EP - 4170
BT - 2024 IEEE International Conference on Image Processing Challenges and Workshops, ICIPCW 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 October 2024 through 30 October 2024
ER -