GitHub Copilot will soon use your data to train its model and share it with others

GitHub Copilot is an extremely powerful coding assistant that can be leveraged in multiple integrated development environments (IDEs). Its primary use involves code completion and generation through natural language prompts, but it also offers other functionalities like summarization for pull requests (PRs), code review, and agentic automation. Now, Microsoft has announced that it plans to make GitHub Copilot even better, at the cost of your real data.

Microsoft has revealed that it is updating its GitHub Copilot interaction data usage policy to allow the company to use this particular type of data to train its AI model. For clarity, interaction data involves inputs, outputs, code snippets, code context, comments, documentation, file names, repository structure, navigation patterns, and basically any interaction with Copilot.

This is a pretty big change that is based on Microsoft"s belief that real-world data will directly result in smarter models. So far, the company had been using public code repositories hosted on GitHub along with hand-crafted specialized models to achieve this purpose, but it also began incorporating data from Microsoft employees recently, which resulted in significant improvements to the model"s quality.

As such, Microsoft has decided to pivot its approach and begin gathering real-world data from customers too. This data will be procured from Copilot Free, Pro, and Pro+ users, whereas Copilot Business, Enterprise, or enterprise-owned repositories will be immune. In addition, Redmond will not use your data at rest.

Another notable disclaimer from this announcement is that your interaction data will also be shared with GitHub affiliates, but Microsoft assures customers that their data will not be shared with third-party AI model providers. The other good thing about this approach is that all users have the ability to opt out through their privacy settings here. But if they don"t do so before April 24, they will automatically be opted in, which is arguably a shady tactic, but hey, Microsoft thinks that leveraging real data will "make a meaningful difference in building AI tools that serve the entire developer community".

Report a problem with article
Next Article

Google starts preparing Android for post-quantum cryptography era

Previous Article

This powerful, but efficient AMD Ryzen 7 processor is now on a discount on Amazon