Microsoft open sources CodeQL queries used in Solorigate investigation

Last week, Microsoft finally completed its Solorigate investigation, concluding that while some code files for Azure, Intune, and Exchange were accessed, no customer data was compromised. The cyberattack had caused major concern around the globe because it targeted the United States' federal departments, the UK, the European Parliament, and thousands of other organizations. Supply chain attacks were executed on SolarWinds, Microsoft, and VMware, with Microsoft President Brad Smith calling it "a moment of reckoning".

Now, Microsoft has open sourced the CodeQL queries that it utilized in the Solorigate investigation.

A laptop display with a coding IDE open and spectacles in the front — *Image via Kevin Ku from Pexels*

For those unaware, CodeQL is code analysis engine which depends upon code semantics and syntax. It develops a database built around the model of the compiling code, which can then be queried just like a regular database. It can be used both for static analysis and retroactive inspection of code.

CodeQL queries were used by Microsoft in its Solorigate investigation in order to analyze its code in a scalable manner and pinpoint indicators of compromise (IoCs) and other coding patterns used by Solorigate attackers directly on a code-level.

Microsoft essentially built multiple CodeQL databases from various build pipelines, and then aggregated them in a single infrastructure to enable system-wide querying capabilities. This enabled the firm to detect malicious activity in code within hours of a coding pattern being described.

Given that this is more of a syntactic and semantic technique that depends upon identifying similarities in coding patterns such as the variable names used, Microsoft has emphasized that if you find the same patterns in your own code base, that does not necessarily mean that it's compromised. Multiple programmers can of course have the same coding style.

At the same time, it is also important to remember that a malicious actor is not constrained to a single coding style. Essentially, if the attacker deviates significantly from their usual implant pattern, they would be able to circumvent Microsoft's CodeQL queries. Regarding the syntactic and semantic code pattern identification capabilities of the CodeQL engine, the Redmond tech giant notes that:

By combining these two approaches, the queries are able to detect scenarios where the malicious actor changed techniques but used similar syntax, or changed syntax but employed similar techniques. Because it’s possible that the malicious actor could change both syntax and techniques, CodeQL was but one part of our larger investigative effort.

More information about using Microsoft's CodeQL queries is available here. You can find out more about how to deploy queries here.