Benchmarking the Robustness of LiDAR Semantic
Segmentation Models

When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.

Figure 1. Examples of our proposed SemanticKITTI-C. We corrupt the clean validation set of SemanticKITTI using six types of corruptions
with 16 levels of intensity to build upon a comprehensive robustness benchmark for LiDAR semantic segmentation. Listed examples are point clouds on 16-beam LiDAR sensors, with global and local distortion, in snowfall and fog simulations.

Table 1: Categories and descriptions of corruptions in SemanticKITTI-C. We categorize common LiDAR corruptions into three domains: (1)
adverse weather conditions, (2) measurement noise and (3) cross-device discrepancy.

Fig.2: Corruption of fog.

Fig.3: Noisy LiDAR point clouds. We demonstrate the raw LiDAR point cloud in the first row. The noisy point clouds with global outliers and local distortion are shown in the last two rows. The point cloud is color coded by the height (z value). The best viewed on a screen and zoomed in.

Table 2 LiDAR semantic segmentation approaches on our benchmark.

Fig.4: Visualization. We demonstrate the visualization results of the most robust existing method (MinkowskiNet) and our RLSeg on four
cases, including clean data and three LiDAR corruptions. The noisy points are labeled as ‘ignore’ (black color) and not considered in the
evaluation. The left two columns are colorized by error maps, and the last one is colorized by ground truth

Table 3: Ablation study for RLSeg. KD and PL denote knowledge distillation and pseudo label fine-tuning, respectively.


Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Xu Yan, Chaoda Zheng, Zhen Li, Shuguang Cui, Dengxin Dai

IJCV Submission

[Paper][Data (coming soon)][BibTex]