Meta introduces data2vec, the first self-supervised algorithm for speech, vision, and text

Meta has announced the launch of data2vec, "the first high-performance self-supervised algorithm that learns the same way in multiple modalities, including speech, vision, and text."

Self-supervised learning allows machines to learn about their surroundings by observing them and then deciphering the structure of images, speech, or text, as opposed to how most machines learn from labeled data. This approach makes it more efficient for machines to navigate new complex tasks like comprehending text for more spoken languages.

The press release from Meta stated:

Self-supervised learning algorithms for images, speech, text or other modalities function in very different ways, which has limited researchers in applying them more broadly. Because an algorithm designed for understanding images can’t be directly applied to reading text, it’s difficult to push several modalities ahead at the same rate. With data2vec, we’ve developed a unified way for models to predict their own representations of the input data, regardless if it’s speech, text or audio. By focusing on these representations, a single algorithm can work with completely different types of input.

data2vec algorithm

Data2vec will help with the production of machines that will be able to learn about their surroundings without depending on labeled data. It will also allow for the creation of "more adaptable AI" which will have the capability of carrying out "tasks beyond what's possible today."

For more information on data2vec, head over to the dedicated webpage here.