Multimodal generative AI for interpreting 3D medical images and videos

Jung Oh Lee, Hong Yu Zhou, Tyler M. Berzin, Daniel K. Sodickson, Pranav Rajpurkar

Research output: Contribution to journalReview articlepeer-review

Abstract

This perspective proposes adapting video-text generative AI to 3D medical imaging (CT/MRI) and medical videos (endoscopy/laparoscopy) by treating 3D images as videos. The approach leverages modern video models to analyze multiple sequences simultaneously and provide real-time AI assistance during procedures. The paper examines medical imaging’s unique characteristics (synergistic information, metadata, and world model), outlines applications in automated reporting, case retrieval, and education, and addresses challenges of limited datasets, benchmarks, and specialized training.

Original languageEnglish (US)
Article number273
Journalnpj Digital Medicine
Volume8
Issue number1
DOIs
StatePublished - Dec 2025

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Health Informatics
  • Computer Science Applications
  • Health Information Management

Fingerprint

Dive into the research topics of 'Multimodal generative AI for interpreting 3D medical images and videos'. Together they form a unique fingerprint.

Cite this