AI is causing a massive headache for Linux and laying the groundwork for legal issues

Linus Torvalds shifts stance on automated tools as the flood of unverified AI patches creates a development bottleneck for Linux 7.1.

Tux the Linux mascot — Credit: Larry Ewing

This week, Linus Torvalds shared his latest weekly insights about the fifth release candidate of the Linux 7.1 kernel. For a while now, Torvalds has been telling us that he suspects AI tools are leading to larger patches and that he was OK with this, but in the last two weeks or so, his attitude toward the people using these tools seems to have soured notably.

With the fourth release candidate of Linux 7.1, he criticized people for using these tools to find bugs, but then stopped short of actually submitting a code fix for the issue. They instead palmed the issue off onto other people, essentially inundating them with too much work. With the fifth RC, he said that many of the bug fixes being submitted this late in the cycle can actually wait until Linux 7.2. He has asked contributors to just stick to fixing actual regressions, given that we are three weeks out from the stable release.

Popular chatbot LLMs

While the onslaught of AI-coded patches is causing a headache for Linux kernel maintainers, there is actually a deeper crisis being created. By replacing human comprehension with proprietary, black-box AI models, the kernel is at risk of being polluted by unmaintainable, legally iffy, and opaque bloat.

Using artificial intelligence to help you code can be an immensely valuable tool; it ranges from the human being the primary coder and AI offering some genuinely helpful assistance, to vibe coding, where the human guides an AI coder on what to build. This too can be very useful; with vibe coding, it is not very hard to build a functional product. However, when vibe coding, especially with tools like OpenAI’s Codex, which hides the code from sight, the human instructing the bot is not as familiar with the code that is being written.

While it probably doesn’t matter for individuals doing side projects to vibe code, you probably don’t want that type of thing going on in something as major as the Linux kernel, which is used to power most servers around the world and a sizable number of desktop computers.

Thankfully, Linux maintainers do not let any code in randomly; it is reviewed first. However, with AI tools, people are able to create thousands of superficial patches or complex-looking bug reports in seconds, which, if submitted, land in a maintainer’s inbox, which they then have to spend precious time reviewing.

Aside from putting additional load on maintainers, AI-generated code fixes have the potential to look correct, but can lack structural understanding of the kernel, which could lead to subtle regressions, redundant logic, and edge-case vulnerabilities. This is all code that a maintainer will have to sift through to look for issues, and it could lead to more issues slipping into the kernel.

What we effectively see is a distributed denial of service (DDoS) attack on kernel development. Instead of attacks, cyberattacks on servers, it’s like an attack of easy-to-generate code against a maintainer’s ability to review so many code submissions.

Definition of free software — Credit: FSF

The Linux kernel is considered to be free software as defined by the Free Software Foundation. This software gives users certain freedoms, one of which is “The freedom to study how the program works, and change it so it does your computing as you wish.” By using AI models to write code for the Linux kernel, things get a bit messy philosophically, as this code may not be fully understood by the contributor. This essentially makes the code unreadable in some cases.

Code readability is not just about being able to understand the syntax; it also includes understanding why a certain architectural path was chosen to solve a problem. When a human writes code, they leave a trail of their intent throughout the code, such as contextual naming and comments. When an AI model codes these things can go missing as it doesn’t have any intent; it is just predicting the next most statistically probable token based on its training data. It can give you code that fixes the bug, but there is no underlying rationale.

Down the road, if the code breaks, the original author won’t be able to explain their fix because they delegated the work to AI, which didn’t code with intent.

Another issue with turning to AI to fix code is that it’s going to lead to a generation of incapable developers. What ought to happen is that new contributors come along, submit code, get feedback, improve their code, and so on. But by relying on AI, they lose this valuable development. As the senior contributors retire from the project, it could lead to a generation of developers unable to actually maintain the kernel because they never developed their skills.

Another extremely important consideration about AI contributions is the fact that most, if not all, of the AI models being used today are proprietary black boxes. Even the so-called open source AI models are not really open because users don’t get to see the real source code: the data used to train the AI.

Credit: FSF

There is a chance that copyrighted code that has been trained on makes its way into the Linux kernel; this could cause legal issues for the kernel at some point. Additionally, these models will definitely have been trained on GPLv3-licensed code, and if you use that code, your project also has to use the GPLv3 license. Linux is licensed under GPLv2, so if GPLv3 code appears in the kernel, then this too could be an issue for the project.

In his message accompanying Linux 7.1-rc5, Torvalds said that the maintainers would start being more hard-nosed about what code they’d start accepting this late in the release cycle to prevent bloat. Perhaps it would be wise to extend this attitude more widely so that maintainers can reject unverified, AI-generated submissions more easily.

While Big Tech may be obsessed with “boosting productivity” and fast development speeds, the free software ecosystem, including Linux, would be better served with deeply understood human-written code. It might take longer to write, but the author knows why they have written it, and it can be fixed more easily in the future.

The four freedoms of free software — Credit: FSF

If the Linux kernel is going to adopt AI, maybe it would be worth developers looking into the creation of a fully transparent, copyleft-trained model where users have access to the entire stack, from the training data to the inference model, so that the four freedoms of free software are truly maintained. It’s a bit ironic that, currently, developers are relying on closed models to write an open kernel.

Ultimately, contributors to Linux, and any other project, need to ensure that they can still create human-understandable code. If development is handed over to proprietary AI models, it erodes digital autonomy, reduces human-to-human collaboration, and gives Big Tech too much power.

Let us know in the comments what you think about the use of AI in the Linux kernel, do you think it’ll create legal issues, and how Linus Torvalds respond to its use?