The left figure illustrates the phase feature extraction process, which takes a window of joint velocities with the duration of T as input, then produces frequency domain periodic parameters F, A, B and S as output, from which the phase feature P is computed by updating these parameters cyclically. The right figure is our phase-conditioned human motion prior, called PhaseMP, based on the structure of conditional VAE. It consists of a Prior network, an Encoder network, and a Decoder network. All the networks are used during the training stage, while only the Prior and the Decoder are used during the inference stage.