Artificial intelligence, if it ever truly develops to the extent that many in the field believe it will, has the potential of transforming our society forever. Of course it also has the potential of rising up, annihilating most humans and putting those few still alive into the Matrix. And that latter possibility is exactly what some of Google and Oxford’s researchers are trying to mitigate with a new paper.
Called “Safely Interruptible Agents” and published by the Future of Humanity Institute, the paper sets up a logical framework under which some AI algorithms could be disabled or temporarily halted by a human. The impressive part is when the paper details how to make sure that the AI doesn’t learn or seek to resist such actions from its human controllers. Because, after all, how could we be smarter than a machine we’ve purposely built to be as smart as us?
Indeed, that’s perhaps the biggest worry on the minds of AI researchers, as well as those keeping an eye on the industry. Recently, notable names from the world of science and technology have come out warning that AI may prove dangerous to humanity. Only yesterday, Elon Musk mentioned that there is one company that he’s particularly worried about when it comes to setting an AI loose in the world. The Tesla and SpaceX CEO recently donated $11 million to the Future of Life Institute to research problems of AI safety.
Luckily, Google’s Deep Mind team is also thinking along the same lines. The paper notes how the AI could be essentially tricked to believe that actions it was taking at the behest of its controllers were actually its own decisions. If this sounds like it’ll be the perfect excuse for the machines to annihilate us all once they understand what’s happening – well, we’re right there with you.
A separate problem is the idea that interrupting an AI could lead the system to either assume future interruptions and change its behavior, or not understand the situation at all and become useless to the humans, who are trying to teach it how to perform a task.
The researchers explain how using the tricking technique mentioned above, alongside different instructions could make the AI operate under the idea that an interruption is an external factor that will never repeat itself. In essence, this would mean the AI would never try to learn to prevent such an interruption.
If you want a much deeper dive into how the algorithms work, you can check out the original paper here.
Regardless of that though, it’s clear that humanity needs to have these conversations and much more importantly, do the research, so if the day comes when the god-like AI needs to be stopped, we’ll be ready.