Microsoft researchers leak 38TB of sensitive data due to misconfigured storage on GitHub

As AI projects involve massive datasets, accidental exposures become more common as data is shared between teams. Recently, it is reported that Microsoft accidentally exposed "tens of terabytes" of sensitive internal data online due to a misconfigured cloud storage access setting.

Cloud security firm Wiz discovered that an Azure storage container linked from a GitHub repository used by Microsoft AI researchers had an overly permissive shared-access-signature (SAS) token assigned. This allowed anyone who accessed the storage URL full control over all data in the entire storage account.

For those not familiar, Azure Storage is a service that allows you to store data as a File, Disk, Blob, Queue, or Table. The data exposed included 38 terabytes of files, including the personal backups of two Microsoft employees containing passwords, secret keys, and over 30,000 internal Microsoft Teams messages.

The data had been accessible since 2020 due to the misconfiguration. Wiz notified Microsoft of the issue on June 22, and the company revoked the SAS token two days later.

An investigation found no customer data was involved. However, the exposure could have allowed malicious actors to delete, modify or inject files into Microsoft's systems and internal services over an extended time.

In a blog post, Microsoft wrote;

No customer data was exposed, and no other internal services were put at risk because of this issue. No customer action is required in response to this issue... Our investigation concluded that there was no risk to customers as a result of this exposure.

In response to the findings from Wiz's research, Microsoft has enhanced GitHub's secret scanning service. Microsoft's Security Response Center said it will now monitor all public open-source code modifications for instances where credentials or other secrets are exposed as plain text.

In an interview with TechCrunch, Wiz co-founder Ami Luttwak said;

AI unlocks huge potential for tech companies. However, as data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.

With many development teams needing to manipulate massive amounts of data, share it with their peers or collaborate on public open source projects, cases like Microsoft's are increasingly hard to monitor and avoid.

Source: Microsoft's Security Response Center via TechCrunch