Custom GPU Clusters: The Secret to Efficient AI

If you’re a CTO of an AI-centric product in Silicon Valley and you aren’t fine-tuning your own models … you should be fired.

That may sound dramatic to some, but it’s the truth. We’re not talking about training models from scratch — which can cost millions — but rather fine-tuning, which typically costs less than $100k per model up to 70 billion parameters. Making the decision not to fine-tune your own models is inexcusable when the technology is this accessible.

We are at a crossroads where the gap between the companies that “use” AI and the companies that “build” AI is becoming a chasm. When ChatGPT first landed, it wasn’t just a cool demo for me; my ears perked up because I saw the tectonic shift in how software is built. I immediately went out and bought a high-end PC to start tinkering with running my own model. But the deeper I got, the more I realized that the “off-the-shelf” approach to AI hardware wouldn’t cut it for what we wanted to achieve at Navan. I quickly realized we’d need our own GPUs to start fine-tuning our own models — and this wasn’t exactly something I could get next-day delivery on from Amazon.

Fighting Against AI Scarcity

As the world caught on to the power and multitude of applications for AI, we hit a wall that many are only just now starting to feel: AI Scarcity. The demand for compute has turned into a global scramble. We’ve seen the giants struggle — outages lasting days and provisioning requests going unfulfilled for weeks.

The data highlights exactly how fragile the off-the-shelf ecosystem is right now:

Global data center occupancy hit 97% in January 2026.
Microsoft is sitting on an $80B backlog of Azure orders it cannot fulfill due to power constraints, and had to restrict new subscriptions in Northern Virginia and Texas as early as October 2025.
Large LLMs are facing increasing downtime, growing latencies, and catastrophic outages.

Recognizing this early, we made a strategic bet to build our own cluster. By generating our own proprietary infrastructure, we’ve partially insulated Navan from the volatility of the market. But this isn’t just about having the “gear.” It’s also about efficiency. We’ve designed not only our clusters but also our models to operate as efficiently as possible:

Smaller, faster models: By fine-tuning in-house on specialized data sets, we can develop models that are smaller, more agile, and more performant.
Higher precision: These smaller models require less power and less GPU capacity while delivering higher precision outputs that are customized for our specific domain — travel.
Uncompromising quality: Running our own clusters preserves our ability to deliver a seamless, high-quality experience to our customers, regardless of what’s happening in the broader “compute” supply chain.

Agentic DNA

We are now entering the agentic era. The industry is moving past simple text box responses and toward systems that actually take action. The technology finally allows for it, and quite frankly, our users expect it. They don’t want a chatbot that summarizes a flight delay; they want an agent that has already found three alternative routes and is ready to book one.

However, to execute this at scale, companies must account for the agentic cost multiplier. Agentic workflows consume between 10,000 and 50,000 tokens per task compared to roughly 800 for a single prompt.

Inference costs now represent 80–90% of total AI spend.
We estimate that API inference for a midsize tech company like ours requires 10,000 automated decisions a day. For companies building agentic tech, API inference costs can quickly skyrocket, ranging anywhere between $30K and $300K per month. One company saw a $1,500/month proof-of-concept line item explode into a cost of $1.08M/month in production.
A December 2025 IDC survey found that 92% of organizations implementing agentic AI reported costs being “higher or much higher” than expected. Gartner forecasts that 40% of AI agent projects will be canceled by 2027 due to cost overruns alone.
Reliability also compounds: A 20-call agentic chain running on APIs with 99.5% per-call uptime drops to an unacceptable 90.5% end-to-end reliability.

This is exactly why smaller, cheaper, and faster models hosted on internal clusters are critical.

At Navan, we have the agentic DNA to make this happen. Because we own the full stack — from the GPUs in our cluster to the proprietary models we fine-tune — we have all the tools necessary to build a world where travel and expense manage themselves. We aren’t just reacting to the future; we’re building the foundation to run it.

Share this article

Why Custom GPU Clusters Are the Secret to Efficient AI

Ilan Twig

Fighting Against AI Scarcity

Agentic DNA

More content you might like