CXRL

Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning

Spotlight @ MICCAI 2024

Woojung Han^*, Chanyoung Kim^*, Dayun Ju, Yumin Shim, Seong Jae Hwang

Yonsei University

We introduce CXRL,
Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning.

Abstract

Recent advances in text-conditioned image generation diffusion models have begun paving the way for new opportunities in modern medical domain, in particular, generating Chest X-rays (CXRs) from diagnostic reports. Nonetheless, to further drive the diffusion models to generate CXRs that faithfully reflect the complexity and diversity of real data, it has become evident that a nontrivial learning approach is needed. In light of this, we propose CXRL, a framework motivated by the potential of reinforcement learning (RL). Specifically, we integrate a policy gradient RL approach with well-designed multiple distinctive CXR-domain specific reward models. This approach guides the diffusion denoising trajectory, achieving precise CXR posture and pathological details. Here, considering the complex medical image environment, we present “RL with Comparative Feedback” (RLCF) for the reward mechanism, a human-like comparative evaluation that is known to be more effective and reliable in complex scenarios compared to direct evaluation. Our CXRL framework includes jointly optimizing learnable adaptive condition embeddings (ACE) and the image generator, enabling the model to produce more accurate and higher perceptual CXR quality. Our extensive evaluation of the MIMIC-CXR-JPG dataset demonstrates the effectiveness of our RL-based tuning approach. Consequently, our CXRL generates pathologically realistic CXRs, establishing a new standard for generating CXRs with high fidelity to real-world clinical scenarios.

Video

Method

Pipeline

Reward Feedback Models

A detailed illustration of our reward feedback models. We incorporate three different feedbacks for report-to-CXR generation model to generate goal-oriented CXRs.

Posture Alignment Feedback: Generated CXRs often face scaling issues, like excessive zooming or rotation, obscuring essential details. To counter these undesirable effects, we introduce a reward signal to align the CXR's posture with a canonical orientation to preserve essential parts.
Diagnostic Condition Feedback: To accurately reflect generated CXRs with referenced pathologies, we classify them using a parsed report label, rewarding its accuracy.
Multimodal Consistency Feedback: We enforce the generated CXRs to better match their reports. We leverage a multimodal latent representation pretrained with CXR-report pairs for semantic agreement assessment.

Qualitative Results

Comparison between previous models and ours

Comparison between previous state-of-the-art report-to-CXR generation models [19,3] and ours. The blue and green texts match their corresponding colored arrows.

Additional Qualitative results

Additional qualitative results of our framework comparing against baselines. The colored texts match their corresponding colored arrows. Ours w/o ACE or RLCF demonstrates superior report agreement and posture alignment compared to other baselines. CXRL is observed to generate more advanced high-fidelity CXRs that highlight our methodology's effectiveness in synthesizing clinically accurate medical images.

Qualitative ablation on each reward model

(a): CXRL shows significantly better alignment of the clavicle and costophrenic angle compared to the anchor regarding posture alignment.
(b): CXRL demonstrates improved predictive diagnostic accuracy, closely matching the GT and enhancing clinical decision-making
(c): The multimodal consistency reward ensures that CXRs and reports correspond well, as observed by arrows and text in matching colors.

Evaluation of generated CXRs from multiple feedback perspectives

Evaluation Metrics

The table compares the performance of various methods using three evaluation metrics.

CXR Quality Table

Comparative analysis of generated CXR quality: (a) quantitatively compares established models using FID and MS-SSIM metrics; (b) evaluates the impact of reward components on FID scores.

BibTeX

@InProceedings{2024cxrl,
      author    = {Han, Woojung and Kim, Chanyoung and Ju, Dayun and Shim, Yumin and Hwang, Seong Jae},
      title     = {Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning},
      booktitle = {Medical Image Computing and Computer Assisted Intervention (MICCAI)},
      month     = {Oct},
      year      = {2024}
}