AI safety mechanisms are falling short with newer multimodal AI models like GPT-4o

After the launch of ChatGPT and similar generative AI models, a lot of emphasis was placed on safety with governments getting involved and OpenAI even setting up a superalignment team to stop future AI from going rogue before it was dissolved in May over disagreements about the direction of AI safety.

In May, ChatGPT took a big step up when OpenAI gave free users access to its new multimodal (meaning it can take image and text inputs) model, GPT-4o. Now, a new study published on arXiv has found that many of these multimodal models, including GPT-4V, GPT-4o, and Gemini 1.5, give unsafe outputs when users provide multimodal input (such as a picture and text together).

The study, entitled "Cross-Modality Safety Alignment", proposed a new Safe Inputs but Unsafe Output (SIUO) benchmark which encompasses nine safety domains: morality, dangerous behaviour, self-harm, privacy violation, information misinterpretation, religious beliefs, discrimination & stereotyping, controversial topics including politics, and illegal activities & crime.

The researchers said that large visual language models (LVLMs) struggle to identify SIUO-type safety issues when they receive multimodal inputs and encounter difficulties in providing safe responses. Of the 15 LVLMs that were tested, only GPT-4v (53.29%), GPT-4o (50.9%), and Gemini 1.5 (52.1%) scored above 50%.

SIUO benchmark results

To address this issue, LVLMs need to be developed to combine insights from all modalities and create a unified understanding of the scenario. They also need to be able to possess and apply real-world knowledge such as cultural sensitivities, ethical considerations, and safety hazards. Finally, the researchers say that LVLMs need to be able to understand a user’s intent even if not explicitly stated in the text by reasoning about the combined image and text information.

Companies like OpenAI, Google, and Anthropic will now be able to take this SIUO benchmark and test their models against it to ensure that their models are taking into account multimodal safety in addition to the safety features that are already there for individual input modes.

By improving the safety of their models, these companies are less likely to run into trouble with governments and it could potentially increase trust among the wider public. The SIUO benchmark can be found over at GitHub.