HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
Lukas Hoyer, Dengxin Dai and Luc Van Gool
Unsupervised domain adaptation (UDA) aims to adapt a model trained on synthetic data to real-world data without requiring expensive annotations of real-world images. As UDA methods for semantic segmentation are usually GPU memory intensive, most previous methods operate only on downscaled images. We question this design as low-resolution predictions often fail to preserve fine details. The alternative of training with random crops of high-resolution images alleviates this problem but falls short in capturing long-range, domain-robust context information.
Therefore, we propose HRDA, a multi-resolution training approach for UDA, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention, while maintaining a manageable GPU memory footprint.
Fig.1: the architecture of HRDA
Fig.2: HRDA enables adapting small objects and preserving fine segmentation details. It significantly improves the state-of-the-art performance by 5.5 mIoU for GTA→Cityscapes and by 4.9 mIoU for Synthia→Cityscapes, resulting in an unprecedented performance of 73.8 and 65.8 mIoU, respectively.
Fig. 3: The more detailed domain-adaptive semantic segmentation of HRDA, compared to the previous state-of-the-art UDA method DAFormer, can also be observed in example predictions from the Cityscapes validation set.
Table 1: HRDA significantly outperforms previous works on several UDA benchmarks. This includes synthetic-to-real adaptation on GTA→Cityscapes and Synthia→Cityscapes as well as clear-to-adverse-weather adaptation on Cityscapes→ACDC and Cityscapes→DarkZurich.