All stories

Daytona

How Daytona reviews the code a million agents run on

Daytona builds the infrastructure that AI agents run on: sandboxes that spin up in about 100 milliseconds, and that customers launch hundreds of thousands at a time.

They've raised around $31M, and run production workloads for LangChain, Turing, Writer, and SambaNova.

When you're the layer all those agents run on, you can't afford a bad change. A bug in snapshot distribution or sandbox lifecycle can turn into a customer outage fast.

Toma Puljak has been on the team since day one and touches about 90% of the product, so he sees code go from an agent's first draft all the way to production. These days, that whole pipeline ends on one rule:

"Raise a PR, and don't request a review from anyone until you've resolved all the cubic comments."

— Toma Puljak, software engineer at Daytona

Choosing the best AI code reviewer

Daytona tried a few other AI code review tools before landing on cubic. One option was incredibly noisy.

"You'd fix a batch of comments, push, and it would come back with five new ones. It just loops forever, until you give up and merge anyway."

— Toma Puljak

Another was too complex to configure, and did not work well enough out of the box.

"We couldn't justify spending a whole day just figuring out what to configure before it did anything useful."

— Toma Puljak

They also evaluated other tools in the category, but ultimately preferred cubic's review quality.

The review Daytona won't merge without

Before cubic, code reviews worked how they do in most teams: open a PR, assign it to a (human) reviewer, and wait for them to review the code (often a few hours), and start over.

The review cycle-time was slow, and was becoming the bottleneck in their SDLC. Now, cubic reviews PRs as soon as they're opened. Engineers only request a human review when cubic's comments are resolved.

"I'd say 90% or more of cubic's comments are actually valid. And the few we do dismiss are intentionally kept in because they're part of something bigger that we're still working on."

— Toma Puljak

A reviewer that accurate changes the way you ship software. For Daytona, cubic is now a mandatory step in their shipping process.

It earns that spot twice over, because Daytona is open source with a large public repository.

Reviewing outside contributions used to eat a lot of the team's time. A maintainer had to find an hour for a stranger's PR, and contributors could wait days. Now cubic reviews each one the moment it goes up, so initial review cycles happen without a human maintainer involved.

The P0 that would have broken snapshot distribution

Toma has learned to trust cubic's P0s. It's the most serious flag the tool raises, and it's almost never wrong. A few weeks ago, one showed up on a Daytona PR.

"We had a P0 that probably would have stopped all our snapshots from distributing. That would have been a big incident for us."

— Toma Puljak

For an infrastructure company whose customers depend on those snapshots, that's about as bad as it gets. What made it especially hard to catch was that the bug was nowhere near the code the PR changed.

"The diff didn't even touch the function that would've broken. It changed something in a completely different part of the code, and cubic still caught it."

— Toma Puljak

Daytona's codebase is incredibly complex, with several interdependent systems that rely on each other.

A change in one place can break something on the far side of the codebase, which is the kind of bug that humans struggle to manually catch during code review.

The same blind spot affects security, which Daytona has to clear with outside auditors. They're now setting up a custom cubic agent for security review.

Building the infrastructure AI agents run on

At Daytona, agents write most of the code now. The hard part is trusting it. Agents produce more than anyone can carefully review, and on infrastructure other companies run on, a single missed bug becomes someone else's outage.

cubic is what makes that volume safe to ship. It reviews every PR first, catches the bugs that hide across files, and is right often enough that the team lets it go ahead of them. They get to move at the speed agents write, without lowering the bar on what reaches production.