The pipeline of EAGLE. Leveraging the Laplacian matrix, which integrates hierarchically projected image key features and color affinity, the model exploits eigenvector clustering to capture object-level perspective cues defined as \( \mathrm{\mathcal{M}}_{eicue} \) and \( \mathrm{\tilde{\mathcal{M}}_{eicue}} \). Distilling knowledge from \( \mathrm{\mathcal{M}}_{eicue} \), our model further adopts an object-centric contrastive loss, utilizing the projected vector \( \mathrm{Z} \) and \( \mathrm{\tilde{Z}} \). The learnable prototype \( \mathrm{\Phi} \) assigned from \( \mathrm{Z} \) and \( \mathrm{\tilde{Z}} \), acts as a singular anchor that contrasts positive objects and negative objects. Our object-centric contrastive loss is computed in two distinct manners: intra(\( \mathrm{\mathcal{L}}_{obj} \))- and inter(\( \mathrm{\mathcal{L}}_{sc} \))-image to ensure semantic consistency.
An illustration of the EiCue generation. From the input image, both color affinity matrix \( \mathrm{A_{color}} \) and semantic similarity matrix \( \mathrm{A_{seg}} \) are derived, which are combined to form the Laplacian \( \mathrm{L_{sym}} \). An eigenvector subset \( \mathrm{\hat{V}} \) of \( \mathrm{L_{sym}} \) are clustered to produce EiCue.
Visualizing eigenvectors derived from \( \mathrm{S} \) in the Eigen Aggregation Module. These eigenvectors not only distinguish different objects but also identify semantically related areas, highlighting how EiCue captures object semantics and boundaries effectively.
Comparison between K-means and EiCue. The bottom row presents EiCue, highlighting its superior ability to capture subtle structural intricacies and understand deeper semantic relationships, which is not as effectively achieved by K-means.
Qualitative results of COCO-Stuff dataset trained with ViT-S/8 backbone.
Qualitative results of Cityscapes dataset trained with ViT-B/8 backbone.
Qualitative results of Potsdam-3 dataset trained with ViT-B/8 backbone.
Quantitative results on the COCO-Stuff dataset.
Quantitative results on the Cityscapes dataset.
Quantitative results on the Potsdam-3 dataset.
@InProceedings{2024eagle,
author = {Kim, Chanyoung and Han, Woojung and Ju, Dayun and Hwang, Seong Jae},
title = {EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024}
}