
Microsoft on Tuesday introduced Project Ire, an autonomous AI agent designed to reverse engineer software and determine whether it is malicious, without any human guidance or prior knowledge of the file’s origin.
Project Ire, still in its prototype stage, represents a major leap in cybersecurity automation. According to Microsoft, the tool can carry out one of the most difficult tasks in the field: completely deconstructing a software file to classify it as benign or malicious, an analysis typically that normally requires expert manual effort.
“This is the gold standard in malware classification,” Microsoft stated in its research blog. The AI uses a combination of decompilers, memory analysis sandboxes, control flow reconstruction tools, and advanced language models to dissect and understand software code.
How Project Ire works
According to Microsoft, the system starts by analyzing the file’s internal structure and constructs a control flow graph using tools like angr and Ghidra. This graph becomes the foundation for its investigation.
Through a step-by-step investigation process, the system invokes various tools and uses a “tool-use API” to refine its understanding. Each function it examines contributes to a “chain of evidence,” a trackable record of its reasoning designed to improve system transparency and expert audit.
To validate its verdicts, Project Ire runs a validator tool that compares its findings against logs previously reviewed by human specialists. These tools include Microsoft’s internal resources and third-party contributions such as with Emotion Labs, a contributor to the angr framework.
Promising early results from tests
Project Ire has already shown strong early results. In tests using a public dataset of Windows drivers, Microsoft reported the system achieved a precision of 0.98 and a recall of 0.83, indicating high accuracy and a relatively strong decision rate.
In a more demanding real-world trial involving nearly 4,000 “hard-target” files — samples that had stumped automated tools and awaited human review — Project Ire achieved a precision of 0.89, correctly identifying almost 9 out of 10 malicious files. However, it only detected about 26% of all actual malware (a recall of 0.26), reflecting the challenge of the dataset.
Microsoft acknowledged the moderate recall score but emphasized the system’s potential. “While overall performance was moderate, this combination of accuracy and a low error rate suggests real potential for future deployment,” the company wrote.
What’s next for Project Ire
Microsoft plans to integrate Project Ire into its Defender ecosystem under the name Binary Analyzer. The goal is to reach the point where the tool can autonomously detect novel malware directly in memory.
“Our goal is to scale the system’s speed and accuracy so that it can correctly classify files from any source, even on first encounter,” Microsoft said. “Ultimately, our vision is to detect novel malware directly in memory, at scale.”
Reporting from Black Hat, TechnologyAdvice’s Matt Gonazles wrote about a cybersecurity researcher’s keynote focused on the evolution of malware and how AI is changing the cybersecurity game.