As generative AI services like OpenAI's ChatGPT, Microsoft's Bing Chat, and Google Bard become used more and more as search engine alternatives, they are also running into some resistance from people and companies who don't want their AI models trained on their online content.
Today, Google announced a new way for website administrators to either allow its Bard and Vertex AI services to access its content, or to opt out of being used to train those API models.
In a blog post, Google stated:
Today we’re announcing Google-Extended, a new control that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs, including future generations of models that power those products. By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time.
The support page for this new control offers more information on Google-Extended:
Google-Extended doesn't have a separate HTTP request user agent string. Crawling is done with existing Google user agent strings; the robots.txt user-agent token is used in a control capacity.
In addition to today's announcement, Google stated that it will "explore additional machine-readable approaches to choice and control for web publishers." It includes a link where those publishers can sign up for a mailing list where they will receive additional updates on Google on their efforts to improve controls for sites.
The debate over how generative AI services access online information for their use has grown over the past several months, particularly over how they could access copyrighted content. OpenAI, the company behind ChatGPT, has already been the subject of lawsuits from authors who claim that it has illegally scrapped the content from their books in order to create detailed summaries of their content.