PhaseMP: Robust 3D Pose Estimation via Phase-conditioned Human Motion Prior

1The University of Hong Kong 2Meta 3Seoul National University

Our human motion prior PhaseMP enables us to further constrain the predictions in both nominal and challenging scenarios.

Abstract

We present a novel motion prior, called PhaseMP , modeling a probability distribution on pose transitions conditioned by a frequency domain feature extracted from a periodic autoencoder. The phase feature further enforces the pose transitions to be unidirectional (i.e. no backward movement in time), from which more stable and natural motions can be generated. Specifically, our motion prior can be useful for accurately estimating 3D human motions in the presence of challenging input data, including long periods of spatial and temporal occlusion, as well as noisy sensor measurements. Through a comprehensive evaluation, we demonstrate the efficacy of our novel motion prior, showcasing its superiority over existing state-of-the-art methods by a significant margin across various applications, including video-to-motion and motion estimation from sparse sensor data, and etc.

Video

System Overview

Interpolate start reference image.

The left figure illustrates the phase feature extraction process, which takes a window of joint velocities with the duration of T as input, then produces frequency domain periodic parameters F, A, B and S as output, from which the phase feature P is computed by updating these parameters cyclically. The right figure is our phase-conditioned human motion prior, called PhaseMP, based on the structure of conditional VAE. It consists of a Prior network, an Encoder network, and a Decoder network. All the networks are used during the training stage, while only the Prior and the Decoder are used during the inference stage.

Application 1: From Sparse Sensor Joints

Application 2: From Noisy Data

Application 3: From Keyframes

Application 4: From RGB Videos

BibTeX

@InProceedings{Shi_2023_ICCV,
        author    = {Shi, Mingyi and Starke, Sebastian and Ye, Yuting and Komura, Taku and Won, Jungdam},
        title     = {PhaseMP: Robust 3D Pose Estimation via Phase-conditioned Human Motion Prior},
        booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
        month     = {October},
        year      = {2023},
        pages     = {14725-14737}
    }

Acknowledgement

This work was mainly done during Mingyi Shi's internship in Meta. We thank Deepak Gopinath for his valuable input. Jungdam Won was partially supported by the New Faculty Startup Fund from Seoul National University, ICT(Institute of Computer Technology) at Seoul National University, and the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-2020-0-01460) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation). Taku Komura and Mingyi Shi are partly supported by Technology Commission (Ref:ITS/319/21FP) and Research Grant Council (Ref: 17210222), Hong Kong.