TY - GEN
T1 - Advancing Healthcare in Low-Resource Environments Through an Optimization and Deployment Framework for Medical Multimodal Large Language Models
AU - El Mir, Aya
AU - Luoga, Lukelo Thadei
AU - Chen, Boyuan
AU - Hanif, Muhammad Abdullah
AU - Shafique, Muhammad
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The critical shortage of medical professionals in low-resource countries, notably in Africa, hinders adequate health-care delivery. AI, particularly Multimodal Large Language models (MLLMs), can enhance the efficiency of healthcare systems by assisting in medical image analysis and diagnosis. However, the deployment of state-of-the-art MLLMs is limited in these regions due to the high computational demands that exceed the capabilities of consumer-grade GPUs. This paper presents a framework for optimizing MLLMs for resource-constrained environments. We introduce optimized medical MLLMs including TinyLLaVA-Med-F, a medical fine-tuned MLLM, and quantized variants (TinyLLa VA - Med- FQ4, Tiny LLa VA - Med- FQ8, LLa VA - Med-Q4, and LLaVA-Med-Q8) that demonstrate substantial reductions in memory usage without significant loss in accuracy. Specifically, TinyLLaVA-Med-FQ4 achieves the greatest reductions, lowering dynamic memory by approximately 89% and static memory by 90% compared to LLaVA-Med. Similarly, LLaVA-Med-Q4 reduces dynamic memory by 65% and static memory by 67% compared to state-of-the-art LLaVA-Med. These memory reductions make these models feasible for deployment on consumer-grade GPUs such as RTX 3050. This research underscores the potential for deploying optimized MLLMs in low-resource settings, providing a foundation for future developments in accessible AI-driven healthcare solutions.
AB - The critical shortage of medical professionals in low-resource countries, notably in Africa, hinders adequate health-care delivery. AI, particularly Multimodal Large Language models (MLLMs), can enhance the efficiency of healthcare systems by assisting in medical image analysis and diagnosis. However, the deployment of state-of-the-art MLLMs is limited in these regions due to the high computational demands that exceed the capabilities of consumer-grade GPUs. This paper presents a framework for optimizing MLLMs for resource-constrained environments. We introduce optimized medical MLLMs including TinyLLaVA-Med-F, a medical fine-tuned MLLM, and quantized variants (TinyLLa VA - Med- FQ4, Tiny LLa VA - Med- FQ8, LLa VA - Med-Q4, and LLaVA-Med-Q8) that demonstrate substantial reductions in memory usage without significant loss in accuracy. Specifically, TinyLLaVA-Med-FQ4 achieves the greatest reductions, lowering dynamic memory by approximately 89% and static memory by 90% compared to LLaVA-Med. Similarly, LLaVA-Med-Q4 reduces dynamic memory by 65% and static memory by 67% compared to state-of-the-art LLaVA-Med. These memory reductions make these models feasible for deployment on consumer-grade GPUs such as RTX 3050. This research underscores the potential for deploying optimized MLLMs in low-resource settings, providing a foundation for future developments in accessible AI-driven healthcare solutions.
KW - Artificial intelligence (AI)
KW - Clinical Applications
KW - Medical Diagnostics
KW - Memory Optimization
KW - Multimodal Large Language Models (MLLMs)
KW - Quantization
KW - Resource-Constrained Environments
UR - http://www.scopus.com/inward/record.url?scp=105001320404&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105001320404&partnerID=8YFLogxK
U2 - 10.1109/BHI62660.2024.10913565
DO - 10.1109/BHI62660.2024.10913565
M3 - Conference contribution
AN - SCOPUS:105001320404
T3 - BHI 2024 - IEEE-EMBS International Conference on Biomedical and Health Informatics, Proceedings
BT - BHI 2024 - IEEE-EMBS International Conference on Biomedical and Health Informatics, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE-EMBS International Conference on Biomedical and Health Informatics, BHI 2024
Y2 - 10 November 2024 through 13 November 2024
ER -