Eyecare Foundation Model
New research suggests that an AI foundation model called EyeFM, designed to assist ophthalmologists as a clinical copilot, can improve diagnostic and referral rates in retinal disease screening. In a randomized controlled trial involving 668 participants, ophthalmologists using the tool achieved higher correct diagnosis rates compared to standard care, indicating potential benefits for eyecare in resource-limited settings.
Addressing Challenges in Eyecare Delivery
Ophthalmology relies heavily on multimodal imaging—such as color fundus photography (CFP) and optical coherence tomography (OCT)—to diagnose conditions like diabetic retinopathy, glaucoma, and age-related macular degeneration (AMD). However, existing AI models often process single modalities in isolation, overlooking how clinicians integrate multiple data sources. Moreover, while retrospective studies show AI's capabilities, real-world evaluations of human-AI collaboration remain scarce. This gap hinders adoption, particularly in diverse global settings where access to specialized equipment and expertise is limited.
Building the EyeFM Model
EyeFM is a multimodal vision-language foundation model pre-trained on a vast dataset: 14.5 million ocular images from five imaging modalities, paired with 0.4 million clinical texts from multiethnic sources. This pretraining enables the model to handle tasks like disease detection, lesion segmentation, and vision-language functions such as generating image reports or answering visual questions. A human-in-the-loop feature allows for clinician feedback to refine outputs, mimicking collaborative clinical workflows.
Retrospective Performance Benchmarks
In initial validations using retrospective datasets, EyeFM demonstrated comparable or superior performance to existing models. For single-modality tasks, it achieved high accuracy in detecting common retinal diseases from CFP and OCT images. Cross-modality tests—using CFP to identify conditions typically requiring OCT—yielded promising results, with area under the curve (AUROC) scores around 0.883 for central-involved diabetic macular edema. Integrated-modality diagnostics, combining CFP and OCT, further improved accuracy. Vision-language tasks, including report generation, showed higher metrics than alternatives like LLaVA, with EyeFM producing more empathetic and clinically relevant responses in head-to-head comparisons.
Validating Efficacy in Clinical Settings
To assess real-world utility, researchers conducted multicountry reader studies with 44 ophthalmologists from North America, Europe, Asia, and Africa. Ophthalmologists using EyeFM as a copilot showed increased sensitivity for detecting referable diabetic retinopathy, glaucoma suspects, and AMD suspects from CFP, without compromising specificity. Time efficiency also improved, with reports taking less time to complete.
A multicenter real-world study in primary and tertiary care centers reinforced these findings. Junior eyecare providers assisted by EyeFM exhibited higher diagnostic accuracy for common eye diseases, particularly in settings with limited OCT access.
Insights from the Randomized Controlled Trial
The study's centerpiece was a double-masked randomized controlled trial in China, focusing on fundus disease screening in a high-risk population (mean age 57.5 years, mostly male). Participants were allocated to ophthalmologists using EyeFM or standard care. Results indicated a higher correct diagnostic rate (92.2% versus 75.4%) and referral rate (92.2% versus 80.5%) in the intervention group. Secondary outcomes included improved standardization of clinical reports and greater patient compliance with self-management (70.1% versus 49.1%) and referral suggestions (33.7% versus 20.2%) at two-week follow-up. Participant satisfaction remained similar across groups.
Implications and the Path Forward
These encouraging results highlight EyeFM's potential to augment clinician performance, standardize care, and improve patient outcomes, especially in underserved areas. By facilitating cross-modality detection, the model could reduce reliance on expensive equipment, broadening access to quality eyecare.
However, limitations exist: the trial was single-center, focused on short-term outcomes, and did not measure long-term visual health. Further multicenter trials are needed to confirm generalizability and address ethical considerations in AI integration. As the field evolves, such phased validations provide a structured framework for translating foundation models into clinical practice.