Dr. Dengxin Dai

Senior Researcher
E-Mail Please contact TODO
Phone +41 xx xxx xx xx
Address Stampfenbachstrasse 48, 8092 Zürich, Switzerland
Room ETH Zurich, Computer Vision Lab

Biography

Dengxin Dai is a Senior Researcher working with the Computer Vision Lab at ETH Zurich. In 2016, he obtained his PhD in Computer Vision at ETH Zurich. Since then he is the Team Leader of TRACE-Zurich, working on Autonomous Driving within the R&D project “TRACE: Toyota Research on Automated Cars in Europe”. His research interests lie in autonomous driving, robust perception in adverse weather and illumination conditions, automotive sensors and computer vision under limited supervision.

Research Interests & Workshops

He has organized a CVPR’19 Workshop on Vision for All Seasons: Bad Weather and Nighttime , and is organizing an ICCV’19 workshop on Autonomous Driving. He has been a program committee member of several major computer vision conferences and received multiple outstanding reviewer awards. He is a guest editor for the IJCV special issue Vision for All Seasons and is an area chair for WACV 2020.

Publications & Projects

  • ICRA, 2022

    The first end-to-end approach to learn to optimize the LiDAR beam configuration for given applications

    Existing learning methods for LiDAR-based applications use 3D points scanned under a pre-determined beam configuration, e.g., the elevation angles of beams are often evenly distributed. Those fixed configurations are task-agnostic, so simply using them can lead to sub-optimal performance. In this work, we take a new route to learn to optimize the LiDAR beam configuration for a given application. Specifically, we propose a reinforcement learning-based learning-to-optimize (RL-L2O) framework to automatically optimize the beam configuration in an end-to-end manner for different LiDAR-based applications. The optimization is guided by the final performance of the target task and thus our method can be integrated easily with any LiDAR-based application as a simple drop-in module. The method is especially useful when a low-resolution (low-cost) LiDAR is needed, for instance, for system deployment at a massive scale. We use our method to search for the beam configuration of a low-resolution LiDAR for two important tasks: 3D object detection and localization. Experiments show that the proposed RL-L2O method improves the performance in both tasks significantly compared to the baseline methods. We believe that a combination of our method with the recent advances of programmable LiDARs can start a new research direction for LiDAR-based active perception. The code is publicly available at github.com/vniclas/lidar_beam_selection.
    Read More
  • CVPR, 2022

    The first MOT formulation designed to be solved with Adiabatic Quantum Computing.

    Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions.
    Read More
  • CVPR (Oral), 2022

    A novel simulation approach to simulate snowfall effects into existing LiDAR dataset to train robust LiDAR-based perception methods for adverse weather

    3D object detection is a central task for applications such as autonomous driving, in which the system needs to localize and classify surrounding traffic agents, even in the presence of adverse weather. In this paper, we address the problem of LiDAR-based 3D object detection under snowfall. Due to the difficulty of collecting and annotating training data in this setting, we propose a physically based method to simulate the effect of snowfall on real clear-weather LiDAR point clouds. Our method samples snow particles in 2D space for each LiDAR line and uses the induced geometry to modify the measurement for each LiDAR beam accordingly. Moreover, as snowfall often causes wetness on the ground, we also simulate ground wetness on LiDAR point clouds. We use our simulation to generate partially synthetic snowy LiDAR data and leverage these data for training 3D object detection models that are robust to snowfall. We conduct an extensive evaluation using several state-of-the-art 3D object detection methods and show that our simulation consistently yields significant performance gains on the real snowy STF dataset compared to clear-weather baselines and competing simulation approaches, while not sacrificing performance in clear weather. Our code is available at github.com/SysCV/LiDAR_snow_sim.
    Read More
  • CVPR, 2022

    A novel method for long-term test-time adaptation under continually changing environments.

    Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data. Existing works mainly consider the case where the target domain is static. However, real-world machine perception systems are running in non-stationary and continually changing environments where the target domain distribution can change over time. Existing methods, which are mostly based on self-training and entropy regularization, can suffer from these non-stationary environments. Due to the distribution shift over time in the target domain, pseudo-labels become unreliable. The noisy pseudo-labels can further lead to error accumulation and catastrophic forgetting. To tackle these issues, we propose a continual test-time adaptation approach (CoTTA) which comprises two parts. Firstly, we propose to reduce the error accumulation by using weight-averaged and augmentation-averaged predictions which are often more accurate. On the other hand, to avoid catastrophic forgetting, we propose to stochastically restore a small part of the neurons to the source pre-trained weights during each iteration to help preserve source knowledge in the long-term. The proposed method enables the long-term adaptation for all parameters in the network. CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models. We demonstrate the effectiveness of our approach on four classification tasks and a segmentation task for continual test-time adaptation, on which we outperform existing methods. Our code is available at https://qin.ee/cotta.
    Read More
  • CVPR, 2022

    ZegFormer is the first framework that decouple the zero-shot semantic segmentation into: 1) class-agnostic segmentation and 2) segment-level zero-shot classification

    Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zeroshot classification problem, and transfer semantic knowledge from seen classes to unseen ones with the help of language models pre-trained only with texts. While simple, the pixel-level ZS3 formulation shows the limited capability to integrate vision-language models that are often pre-trained with image-text pairs and currently demonstrate great potential for vision tasks. Inspired by the observation that humans often perform segment-level semantic labeling, we propose to decouple the ZS3 into two sub-tasks: 1) a classagnostic grouping task to group the pixels into segments. 2) a zero-shot classification task on segments. The former task does not involve category information and can be directly transferred to group pixels for unseen classes. The latter task performs at segment-level and provides a natural way to leverage large-scale vision-language models pre-trained with image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 22 points on the PASCAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. Code will be released at https: //github.com/dingjiansw101/ZegFormer
    Read More
  • CVPR (Oral), 2022

    ScribbleKITTI, the first scribble-annotated dataset for LiDAR semantic segmentation and a novel learning method to reduce the performance gap that arises when using such weak annotations.

    Densely annotating LiDAR point clouds remains too expensive and time-consuming to keep up with the ever growing volume of data. While current literature focuses on fully-supervised performance, developing efficient methods that take advantage of realistic weak supervision have yet to be explored. In this paper, we propose using scribbles to annotate LiDAR point clouds and release ScribbleKITTI, the first scribble-annotated dataset for LiDAR semantic segmentation. Furthermore, we present a pipeline to reduce the performance gap that arises when using such weak annotations. Our pipeline comprises of three stand-alone contributions that can be combined with any LiDAR semantic segmentation model to achieve up to 95.7% of the fully-supervised performance while using only 8% labeled points. Our scribble annotations and code are available at github.com/ouenal/scribblekitti
    Read More
  • ECCV, 2022

    A multi-resolution training approach for UDA, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention.

    Unsupervised domain adaptation (UDA) aims to adapt a model trained on the source domain (e.g. synthetic data) to the target domain (e.g. real-world data) without requiring further annotations on the target domain. This work focuses on UDA for semantic segmentation as real-world pixel-wise annotations are particularly expensive to acquire. As UDA methods for semantic segmentation are usually GPU memory intensive, most previous methods operate only on downscaled images. We question this design as low-resolution predictions often fail to preserve fine details. The alternative of training with random crops of high-resolution images alleviates this problem but falls short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution training approach for UDA, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention, while maintaining a manageable GPU memory footprint. HRDA enables adapting small objects and preserving fine segmentation details. It significantly improves the state-of-the-art performance by 5.5 mIoU for GTA-to-Cityscapes and 4.9 mIoU for Synthia-to-Cityscapes, resulting in unprecedented 73.8 and 65.8 mIoU, respectively. The implementation is available at this https URL.
    Read More
  • NeurIPS, 2022

    A novel Motion TRansformer (MTR) framework that models motion prediction as the joint optimization of global intention localization and local movement refinement. Won the championship of Waymo Open Dataset Challenge 2022 on Motion Prediction

    Predicting multimodal future behavior of traffic participants is essential for robotic vehicles to make safe decisions. Existing works explore to directly predict future trajectories based on latent features or utilize dense goal candidates to identify agent's destinations, where the former strategy converges slowly since all motion modes are derived from the same feature while the latter strategy has efficiency issue since its performance highly relies on the density of goal candidates. In this paper, we propose Motion TRansformer (MTR) framework that models motion prediction as the joint optimization of global intention localization and local movement refinement. Instead of using goal candidates, MTR incorporates spatial intention priors by adopting a small set of learnable motion query pairs. Each motion query pair takes charge of trajectory prediction and refinement for a specific motion mode, which stabilizes the training process and facilitates better multimodal predictions. Experiments show that MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges, ranking 1st on the leaderboards of Waymo Open Motion Dataset. Code will be available at this https URL.
    Read More
  • Authors: Yuhua Chen, Haoran Wang, Wen Li, Christos Sakaridis, Dengxin Dai and Luc Van Gool

    We present Scale-aware Domain Adaptive Faster R-CNN, a model aiming at improving the cross-domain robustness of object detection.

    Object detection typically assumes that training and test samples are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch may lead to a significant performance drop. In this work, we present Scale-aware Domain Adaptive Faster R-CNN, a model aiming at improving the cross-domain robustness of object detection. In particular, our model improves the traditional Faster R-CNN model by tackling the domain shift on two levels: (1) the image-level shift, such as image style, illumination, etc., and (2) the instance-level shift, such as object appearance, size, etc. The two domain adaptation modules are implemented by learning domain classifiers in an adversarial training manner. Moreover, we observe that the large variance in object scales often brings a crucial challenge to cross-domain object detection. Thus, we improve our model by explicitly incorporating the object scale into adversarial training. We evaluate our proposed model on multiple cross-domain scenarios, including object detection in adverse weather, learning from synthetic data, and cross-camera adaptation, where the proposed model outperforms baselines and competing methods by a significant margin. The promising results demonstrate the effectiveness of our proposed model for cross-domain object detection.
    Read More
  • Authors: Shijie Li, Xieyuanli Chen, Yun Liu, Dengxin Dai, Cyrill Stachniss and Juergen Gall

    We propose a projection-based method for semantic segmentation of LiDar data, called Multi-scale Interaction Network (MINet), which is very efficient and accurate.

    Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles and robots, which are usually equipped with an embedded platform and have limited computational resources. Approaches that operate directly on the point cloud use complex spatial aggregation operations, which are very expensive and difficult to deploy on embedded platforms. As an alternative, projection-based methods are more efficient and can run on embedded hardware. However, current projection-based methods either have a low accuracy or require millions of parameters. In this letter, we therefore propose a projection-based method, called Multi-scale Interaction Network (MINet), which is very efficient and accurate. The network uses multiple paths with different scales and balances the computational resources between the scales. Additional dense interactions between the scales avoid redundant computations and make the network highly efficient. The proposed network outperforms point-based, image-based, and projection-based methods in terms of accuracy, number of parameters, and runtime. Moreover, the network processes more than 24 scans, captured by a high-resolution LiDAR sensor with 64 beams, per second on an embedded platform, which is higher than the framerate of the sensor. The network is therefore suitable for robotics applications.
    Read More
  • End-to-End Optimization of LiDAR Beam Configuration for 3D Object Detection and Localization Niclas Vödisch, Ozan Unal, Ke Li, Luc [...]

    Read More
  • Adiabatic Quantum Computing for Multi Object Tracking Jan-Nico Zaech, Alexander Liniger, Martin Danelljan, Dengxin Dai, Luc Van Gool [...]

    Read More
  • LiDAR Snowfall Simulation for Robust 3D Object Detection Martin Hahner, Christos Sakaridis, Mario Bijelic, Felix Heide, Fisher Yu, Dengxin [...]

    Read More
  • Continual Test-Time Domain Adaptation Qin Wang, Olga Fink, Luc Van Gool, Dengxin Dai Test-time domain adaptation aims to [...]

    Read More
  • Decoupling Zero-Shot Semantic Segmentation Jian Ding, Nan Xue, Gui-Song Xia and Dengxin Dai Zero-shot semantic segmentation (ZS3) aims [...]

    Read More
  • Scribble-Supervised LiDAR Semantic Segmentation Ozan Unal, Dengxin Dai and Luc Van Gool Densely annotating LiDAR point clouds remains [...]

    Read More
  • HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation Lukas Hoyer, Dengxin Dai and Luc Van Gool Unsupervised domain adaptation (UDA) aims [...]

    Read More
  • Motion Transformer with Global Intention Localization and Local Movement Refinement Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele MPI [...]

    Read More
  • Scale-Aware Domain Adaptive Faster R-CNN Publication "Scale-Aware Domain Adaptive Faster R-CNN" Yuhua Chen, Haoran Wang, Wen Li, [...]

    Read More
  • Multi-Scale Interaction for Real-Time LiDAR Data Segmentation on an Embedded Platform Publication "Multi-Scale Interaction for Real-Time LiDAR [...]

    Read More