Improving model’s generalizability against domain shifts is crucial, especially for safety-critical applications such as autonomous driving. Real-world domain styles can vary substantially due to environment changes and sensor noises, but deep models only know the training domain style. Such domain style gap impedes model generalization on diverse real-world domains. Our proposed Normalization Perturbation (NP) can effectively overcome this domain style overfitting problem. We observe that this problem is mainly caused by the biased distribution of low-level features learned in shallow CNN layers. Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training. We further explore the style-sensitive channels for effective style synthesis. Normalization Perturbation only relies on a single source domain and is surprisingly effective and extremely easy to implement. Extensive experiments verify the effectiveness of our method for generalizing models under real-world domain shifts.

Figure 1. Visualizations for feature channel statistics on Cityscapes (source domain, red) and Foggy
Cityscapes (target domain, blue). (a) For two domain images with the same content but different
styles, we show their feature channel statistics and differences on the pretrained backbone at stage 1.
The statistic values of the Foggy Cityscapes image are converted to negative for better visualization.
The feature channel statistics of the target domain image deviate around the source domain statistics.
(b) The t-SNE [19] visualization for the feature channel statistics on different stages. The model is
trained on the source domain and evaluated on both domains. The distance between two domains is
computed by Maximum Mean Discrepancy [20] (MMD). After equipping Normalization Perturbation
(NP) in shallow CNN layers, our model can effectively blend distinct domain style distributions. The
target domain distribution can be properly covered by the perturbed source domain distribution in the
deep CNN layers. Thus our model generalizes much better on the target domain

Fig.2: Accumulated Maximum Mean Discrepancy (MMD) for the feature channel statistics of different dataset pairs. Four models are evaluated on different convolutional stages. The smaller MMD means smaller feature-level domain/style gap among datasets.

Fig.3: Our Normalization Perturbation (NP) is applied at shallow CNN layers only during training

Table 1: Robust object detection results.

Publication:

Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts

Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai

ICLR 2023

[Paper][BibTex]