Nvidia unveils deep learning-based AI model that automatically choreographs music

The field of deep learning in artificial intelligence (AI) is one in which advancements are continuously being made. Nvidia has conducted various researches in this regard in the recent past. Earlier this month, the firm teamed up with Hackster to introduce the AI at the Edge challenge. Participants in the upcoming challenge were noted to be able to utilize the Nvidia Jetson Nano Developer Kit to develop creative new models based on neural networks.

In November, meanwhile, Nvidia announced Jarvis, a multi-modal AI software development kit that unites a variety of sensors under a single system. The firm also recently prototyped a new algorithm that enables robots to pick up arbitrary objects.

It doesn't look like the company is slowing down in its pursuit of similar research, though, since it has now unveiled a new deep learning-based model at NeurIPS 2019 that automatically generates appropriate dance moves based on input music. Dubbed the AI Choreographer, the model has been developed in collaboration with the University of California, Merced.

Although it may not seem to be difficult of a task on the surface, the research team behind this model has noted that measuring an exacting correlation between music and dancing actually involves consideration of a number of variables. These include the beat of the music, its style, and more. A total of 361,000 dance clips from three representative dance categories - Ballet, Zumba and Hip-Hop - were collected by the researchers to train the generative adversarial network (GAN) used in the system.

The GAN is a core component of the decompositions-to-compositions framework that can be viewed below. The step-by-step behind the schematic has been described in the following way:

In the top-down decomposition phase, the team normalizes the dance units that are segmented from a real dancing sequence using a kinematic beat detector. They then train the DU-VAE to model the dance units. In the bottom-up composition phase, given a pair of music and dance, the team leverages the MM-GAN to learn how to organize the dance units conditioned on the given music. In the testing phase, the researchers extract style and beats from the input music, then synthesize a sequence of dance units in a recurrent manner, and in the end, apply the beat warper to the generated dance unit sequence to render the output dance.

*The model's decompositions-to-compositions framework*

For the purpose of training the model, the PyTorch deep learning framework was utilized, with the actual training being performed on Nvidia Tesla v100 GPUs. Pose processing, meanwhile, was performed using OpenPose, which is a real-time multi-person system for joint detection of human body, hand, facial, and foot key points in single images.

Nvidia plans to expand the proposed method to other dancing styles, including pop-dance and partner dance, in the future. The source code and models from the research will be published on GitHub once the NeurIPS conference has concluded. For now, you can go through the research paper to learn in more detail about how AI Choreographer was developed.