AI Can Break Your Software. Now What?

Last month a global technology company asked my team to look at their source code. The run analyzed the entire repository, surfaced 590 vulnerability candidates, validated them down to the six real, provable bugs underneath, and produced the most likely fix for each. Total time was seven hours and twelve minutes.

The CISO didn’t push back on the findings. His first reaction was relief. Validation had pinpointed the work. His engineers weren’t going to spend the next nine months arguing about which of the findings were real, or whether security got to interrupt the roadmap. They could take the six and ship the fixes inline with the work they were already doing.

Then he asked the harder question. What does my program look like when anyone can produce a list like this?

The findings weren’t the problem. The list was concise and real. A few of them chained into paths he hadn’t seen before. What he wasn’t getting from anywhere else was that ratio. The default output of a generative scanner today is hundreds of “maybe vulns” with no proof, no chaining, no fix. Open source maintainers have already started shutting things down. cURL ended its bug bounty program at the end of January 2026 over what its maintainer described as a torrent of AI slop. By 2025 fewer than one in twenty submissions to the program held up to scrutiny. Node.js paused shortly after, when the Internet Bug Bounty itself stopped paying out because AI-assisted discovery was outpacing what maintainers could triage. Discovery has commoditized. Validation has not. That’s where the bottleneck moved.

For twenty years, software security cadence assumed a scarce resource. Experienced humans who could discover exploitable flaws in your code. That scarcity is gone. The offensive AI tools demoed at RSA in March now have open source variants. Anyone with a laptop can run them against your endpoints. Most of the people doing it don’t even know they’re attacking you until something works.

Most CISOs haven’t sat with the implication yet.

Three things break at once

First, more people are looking. The population of attackers capable of discovering a serious flaw used to be a small group of skilled humans and a handful of automated scanners with known limits. Now it includes any motivated actor with a working internet connection. The volume of probes against your perimeter isn’t constant. It’s climbing.

Second, the signal-to-noise ratio collapses. Every “maybe a vuln” from someone running a generative tool against your stack lands in a queue. Your security team. Your bug bounty triage. Your responsible disclosure mailbox. Most are wrong. A few are not. Sorting them takes time you don’t have.

Third, your remediation cycle didn’t move. The pentest you scoped six months ago tested a version of your code that no longer exists. The findings you triaged last quarter are still half open because engineering has competing priorities. The clock used to run quarterly. It’s now running daily. Your response speed didn’t follow.

Discovery isn’t the bottleneck. Proof, correlation, and action are.

Most of the AI security market is selling defenders better discovery. AI-augmented scanning. AI-driven vulnerability prioritization. Faster everything. That’s table stakes. It has been for a year.

The real problem in the CISO’s queue is downstream of discovery.

Proving the vuln. A scanner says you have a deserialization flaw on a specific endpoint. Is it actually exploitable in your environment, or does an upstream control neutralize it? You used to send that question to a human pentester for the next engagement window. The window doesn’t exist anymore. You need proof inline with the finding, not a quarter later.

Correlating across findings. One flaw is a ticket. Three flaws that chain into a path from public endpoint to customer data are an incident waiting to happen. Most tools surface the flaws separately and leave the chaining to a human analyst with no time. Continuous validation that traces the actual exploit paths is the only way the chain surfaces before the attacker walks it.

Efficiency of action. A correlated, proven exploit path with no remediation owner and no closure timeline still leaves you exposed. The teams making progress aren’t gating shipping on every flaw. That promise breaks the first time a real release deadline lands. They’re working a different lever: compressing the time, effort, and duration of fixing whatever’s real, before or after ship. The leverage on that compression is validation. Engineers don’t burn cycles arguing whether the finding holds up, trading security against the roadmap, or chasing false positives. They get a small number of proven items with proposed fixes and ship them inline with the work they were already doing. That’s a cultural change before it’s a tool change.

That’s what Staris is built around. It’s also what the CISO on the call actually needed. Not more findings. A way to compress the distance between proof and closure.

What CISOs are getting wrong

The most common mistake I see is waiting for the next big model to fix this. Mythos. ChatGPT 5.5 Cyber. Whatever’s next on the roadmap. The bet is that another generation of defensive AI will close the gap. It won’t. What it will do is discover more “maybe vulns” faster, at higher volume. The percentage of true positives might tick up. The absolute count of false positives goes up faster. You don’t need better discovery. You need automated validation that turns “maybe” into “real” or “no,” and that takes context the model doesn’t have on its own. Authenticated users running in their production roles. The mitigating controls you’ve actually deployed. Real-world execution against both. You wire that in, or the volume buries you.

The second mistake is over-rotating. A few CISOs I’ve talked with have proposed pulling back on third-party software, restricting which open source libraries developers can pull, or pausing AI feature work until the threat model catches up. Those aren’t viable answers for any organization that ships software for a living. The answer is tighter loops, not smaller surface.

The third mistake is letting the security team carry this alone. Application security is now an engineering practice that gets validated by security, not a security practice that periodically inspects engineering. The teams that have already made that transition are the ones that don’t panic in the room when the report lands.

Why deferral stopped working

The standard deferral logic on application security investment is some version of “we’ll address it in next year’s budget” or “we’ll roll it into the next quarter’s roadmap.” That logic worked when the gap between an attacker spotting a flaw and exploiting it was measured in weeks of skilled human effort.

The CISO knows that cadence is wrong. The engineering leader knows the remediation window is too long. Neither of them owns the budget conversation that would change either, which is the same governance gap I wrote about in The Discipline Gap.

Starting now, the gap is measured in days. The population that can close it isn’t a small set of humans anymore. Any deferral past a quarter is a bet that nobody motivated runs the available tools against your stack in the meantime. That bet was reasonable in 2018. It isn’t reasonable in 2026.

If you think your environment is getting fewer probes this year than last, you’re wrong. If you think the people probing it are less capable than last year, you’re also wrong. The volume and the capability are both moving in one direction.

Back to the call

The CISO on that call didn’t want a vendor pitch. He wanted to know how to close the distance between a finding and a fix in a world where the discovery side is now everyone’s. The answer isn’t a better scanner. It’s the operational layer that makes proof, correlation, and action keep up with how often someone is looking.

If your program is still running on an assessment cadence built for 2018, that’s the gap to close. Not next quarter. This one.

If you want to see what continuous attack path validation looks like in practice, Staris is built around exactly this loop. If you’re working through the broader program question and want a thought partner, connect@passarel.com is the right door.

Three things break at once

Discovery isn’t the bottleneck. Proof, correlation, and action are.

What CISOs are getting wrong

Why deferral stopped working

Back to the call

Get the next one in your inbox.