Scribble-Supervised LiDAR Semantic Segmentation
Ozan Unal, Dengxin Dai and Luc Van Gool
Densely annotating LiDAR point clouds remains too expensive and time-consuming to keep up with the ever growing volume of data. While current literature focuses on fully-supervised performance, developing efficient methods that take advantage of realistic weak supervision have yet to be explored. In this paper, we propose using scribbles to annotate LiDAR point clouds and release ScribbleKITTI, the first scribble-annotated dataset for LiDAR semantic segmentation. Furthermore, we present a pipeline to reduce the performance gap that arises when using such weak annotations. Our pipeline comprises of three stand-alone contributions that can be combined with any LiDAR semantic segmentation model to achieve up to 95.7% of the fully-supervised performance while using only 8% labeled points. Our scribble annotations and code are available at github.com/ouenal/scribblekitti
Fig.1: We annotate the train-split of SemanticKITTI based on KITTI which consists of 10 sequences, 19130 scans, 2349 million points. ScribbleKITTI contains 189 million labeled points corresponding to only 8.06% of the total point count. We choose SemanticKITTI for its current wide use and established benchmark. We retain the same 19 classes to encourage easy transitioning towards research into scribble-supervised LiDAR semantic segmentation. Our scribble labels can be downloaded here (118.2MB).
Fig.2: Example of scribble-annotated LiDAR point cloud scenes of a single frame (top) and superimposed frames (bottom). Compared are the proposed ScribbleKITTI (left) with the fully labeled counterpart from SemanticKITTI
Fig. 3: Line-annotation process illustrated on a 100m by 100m tile. Classes that span large distances such as building (yellow) and road (pink) can be annotated with only two clicks. As the tile is annotated using 2D lines projected onto the 3D surface, scribbles may become indistinguishable once the viewing angle changes (e.g. bottom right).
Fig.4: Illustration of the proposed pipeline for scribble-supervised LiDAR semantic segmentation comprising of three steps: training, pseudo-labeling, distillation. During training, we preform pyramid local semantic-context (PLS) augmentation before training the mean teacher model on the available scribble-annotations. During pseudo-labeling, we generate target labels in a class-range-balanced (CRB) manner. Finally during distillation, we retrain the mean teacher on the generated pseudo-labels. LS and LU denote the losses applied to the supervised- and unsupervised set of points respectively. Gray arrows propagate label information.
Fig.5: Visual comparison of (50%) (a) class-balanced pseudolabeling  and (b) proposed CRB. As seen right, generated pseudo-labels lack distant sparse region representation when balancing solely on class
Fig.6: Illustration of pyramid local semantic-context (PLS) augmentation based on scribble ground-truth (not to scale). As seen, the semantic-context can provide highly descriptive information about the local neighborhood of a point at scaling resolutions.
Fig.7: Example results from the SemanticKITTI valid-set comparing (a) the ground truth frame; to Cylinder3D  trained (b) scribble-supervised, and (c) scribble-supervised using our proposed pipeline
Table 1. 3D semantic segmentation results evaluated on the SemanticKITTI valid-set. Alongside the per-class metrics we show the relative performance of the scribble supervised approach against the fully supervised (SS/FS).