For months, Kleiner Perkins partner Aditya Naganath had been mulling over his investing thesis that the next wave of AI wasn’t going to be a chatbot—it was going to be software that does the work autonomously, for hours at a time, across thousands of tasks at once. The trouble was, nobody had built the plumbing for it yet. Then he met Neil Movva.
“It felt obvious to both of us that you’re going to need a different, specific inference platform built for these long-running agents,” Naganath told Fortune.
Now, six months after Naganath and Movva first chatted, Movva’s startup, Sail Research, has launched from stealth with $80 million in seed and Series A funding at a $450 million valuation, Fortune learned exclusively. Kleiner Perkins led the Series A. Sequoia, Redpoint, Theory Ventures, Vine Ventures, and CRV also participated.
Sail Research wants to fix one of AI’s expensive problems. AI infrastructure was designed for quick, single exchanges—think a chatbot answering a question. But enterprises are increasingly deploying AI agents that run autonomously for hours, reading entire codebases, screening hundreds of job candidates, or researching complex topics without a human in the loop. At that scale, enterprise AI bills have tripled even as per-token prices have fallen, because agentic workflows consume tokens at a rate 50 to 500 times higher than simple chat. Goldman Sachs forecasts a 24-fold increase in token consumption by 2030.
Movva’s solution is an end-to-end infrastructure platform built from the lowest level of the chip up. Sail writes the software that orchestrates and optimizes how AI models run on existing chips. Think of it like a highly efficient traffic system that tells the hardware exactly how to allocate its resources, squeezing far more work out of the same physical computing power.
Most AI serving platforms optimize for low latency, meaning they prioritize getting you an answer fast. Sail does the opposite, sacrificing real-time responsiveness to pack far more computing work into every unit of power. The tradeoff is deliberate: Sail can’t power a voice assistant or a live chatbot. But for agents that run for hours? Movva claims customers often seen between 3x to 10x cost improvements over comparable alternatives.
“We only care about efficiency,” Movva told Fortune. “It’s quite difficult to build an inference engine for both throughput and latency at the same time. Everyone else is optimizing for latency, and we just care about throughput.”
Movva, 28, is one of a small number of engineers who has worked at every meaningful layer of the AI stack. He watched NVIDIA pivot from gaming chips to AI silicon in 2016 and 2017. He joined Apple to work on the chip powering computer vision on a billion iPhones—then grew frustrated that Apple’s ambition topped out at animoji (the animated characters users can apply on FaceTime). From there, he went to Together AI, one of the leading open-source model inference providers, to get back to GPU-level work. What he saw there crystallized Sail’s thesis: Together had been built for interactive applications and had made every architectural trade-off accordingly. Long-horizon agents needed something built from scratch with different priorities.
Co-founder and CTO Samir Menon also comes from Apple, where he worked in security engineering at scale. The two met on the first day of freshman year at Stanford—they took the same classes, and saw the same academic counselor. Movva jokes that Menon got slightly better grades. They reunited in late 2025 to rebuild the inference stack from scratch.
Sail launched its inference service in March and has already ramped to processing trillions of tokens per week. One early customer, Detail.dev, uses Sail to run code-review agents that spend three to four hours—sometimes longer—digging through an entire codebase hunting for bugs that five-minute reviews miss. “The abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases,” Movva said.
But the competitive risk is real. Together AI is a formidable incumbent, and it’s also a Kleiner Perkins portfolio company. Naganath’s view is that the two are not in conflict: Together owns the interactive, chat-based market; Sail owns the long-running agent workload. “Being specific and purpose-built should win out in the long run,” he said. The larger threat may come from the frontier labs—Anthropic, OpenAI, and Google—which are building their own inference infrastructure and could, in theory, commoditize the layer Sail is betting on.
Movva’s counter: token prices have been flat or rising for six months, demand for compute is growing faster than supply, and the world needs someone focused obsessively on squeezing the most intelligence out of every available GPU. “We feel an emotional pain when we see a GPU be idle or wasted in any way,” he said.
Naganath’s bull case is simple: “The belief that inference is going to be a 10x—even 100x—bigger market than it is today.”
This story was originally featured on Fortune.com


