Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning
Abstract
Video
Method
Pipeline
Reward Feedback Models
A detailed illustration of our reward feedback models. We incorporate three different feedbacks for report-to-CXR generation model to generate goal-oriented CXRs.
- Posture Alignment Feedback: Generated CXRs often face scaling issues, like excessive zooming or rotation, obscuring essential details. To counter these undesirable effects, we introduce a reward signal to align the CXR's posture with a canonical orientation to preserve essential parts.
- Diagnostic Condition Feedback: To accurately reflect generated CXRs with referenced pathologies, we classify them using a parsed report label, rewarding its accuracy.
- Multimodal Consistency Feedback: We enforce the generated CXRs to better match their reports. We leverage a multimodal latent representation pretrained with CXR-report pairs for semantic agreement assessment.
Qualitative Results
Comparison between previous models and ours
Comparison between previous state-of-the-art report-to-CXR generation models [19,3] and ours. The blue and green texts match their corresponding colored arrows.
Additional Qualitative results
Additional qualitative results of our framework comparing against baselines. The colored texts match their corresponding colored arrows. Ours w/o ACE or RLCF demonstrates superior report agreement and posture alignment compared to other baselines. CXRL is observed to generate more advanced high-fidelity CXRs that highlight our methodology's effectiveness in synthesizing clinically accurate medical images.
Qualitative ablation on each reward model
- (a): CXRL shows significantly better alignment of the clavicle and costophrenic angle compared to the anchor regarding posture alignment.
- (b): CXRL demonstrates improved predictive diagnostic accuracy, closely matching the GT and enhancing clinical decision-making
- (c): The multimodal consistency reward ensures that CXRs and reports correspond well, as observed by arrows and text in matching colors.
Evaluation of generated CXRs from multiple feedback perspectives
Evaluation Metrics
The table compares the performance of various methods using three evaluation metrics.
CXR Quality Table
Comparative analysis of generated CXR quality: (a) quantitatively compares established models using FID and MS-SSIM metrics; (b) evaluates the impact of reward components on FID scores.
BibTeX
@InProceedings{2024cxrl,
author = {Han, Woojung and Kim, Chanyoung and Ju, Dayun and Shim, Yumin and Hwang, Seong Jae},
title = {Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning},
booktitle = {Medical Image Computing and Computer Assisted Intervention (MICCAI)},
month = {Oct},
year = {2024}
}