SADA: Semantic Adversarial Unsupervised Domain Adaptation for Temporal Action Localization

David Pujol-Perich, Albert Clapés, Sergio Escalera
University of Barcelona and Computer Vision Center, Spain
WACV 2025

Abstract

Temporal Action Localization (TAL) is a complex task that poses relevant challenges, particularly when attempting to generalize on new -- unseen -- domains in real-world applications. These scenarios, despite realistic, are often neglected in the literature, exposing these solutions to important performance degradation. In this work, we tackle this issue by introducing, for the first time, an approach for Unsupervised Domain Adaptation (UDA) in sparse TAL, which we refer to as Semantic Adversarial unsupervised Domain Adaptation (SADA). Our contributions are threefold: (1) we pioneer the development of a domain adaptation model that operates on realistic sparse action detection benchmarks; (2) we tackle the limitations of global-distribution alignment techniques by introducing a novel adversarial loss that is sensitive to local class distributions, ensuring finer-grained adaptation; and (3) we present a novel set of benchmarks based on EpicKitchens100 and CharadesEgo, that evaluate multiple domain shifts in a comprehensive manner. Our experiments indicate that SADA improves the adaptation across domains when compared to fully supervised state-of-the-art and alternative UDA methods, attaining a performance boost of up to 6.14 mAP.

Addressing Feature Misalignment

Traditional UDA methods align domain distributions globally, which often leads to "feature misalignment" where specific actions from the source domain are incorrectly mapped to different actions in the target domain. Our proposed SADA loss overcomes this limitation by aligning each action's distribution across both domains individually. This finer-grained adaptation ensures that the semantic boundaries of actions are preserved even under significant domain shifts.

SADA loss mechanism
By aligning individual action distributions, SADA prevents the collapse of class-specific features during domain adaptation.

New Benchmarking Setups

Real-world video understanding requires handling sparse action detection and intersecting labels. Since existing benchmarks were inadequate, we propose a comprehensive suite of 6 new benchmarks based on EpicKitchens-100. These setups examine both appearance and acquisition domain shifts, providing a robust framework for assessing a model's adaptability to diverse environments.

EpicKitchens-100 UDA Benchmarks
Overview of the proposed benchmarks examining various types of domain shifts in sparse TAL.

Bibliography

@inproceedings{pujol2025sparse,
  title={Sparse-dense side-tuner for efficient video temporal grounding},
  author={Pujol-Perich, David and Escalera, Sergio and Clap{\'e}s, Albert},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={21515--21524},
  year={2025}
}