Inference is the New Runtime: Our Investment in Fireworks

Fireworks AI co-founding team (L-R): Dmytro Ivchenko, Benny Chen, James Reed, Dmytro (Dima) Dzhulgakov, Lin Qiao, Pawel Garbacki. Not pictured: Chenyu Zhao

Software has already changed the world, but AI is transforming it in ways we’re only beginning to understand. From code generation and support automation to research and supply chain optimization, we’re in the early stages of a shift that will redefine how we work, create, and interact with technology. AI is making software more capable, adaptable, and integrated into every part of daily life.

And yet, despite all the progress, there’s still a massive gap between what’s possible in theory and what’s realistic in practice. For the average organization today, building an AI-native application is incredibly hard. Open-source models offer more control and flexibility, but running them in production—with speed, scale, and reliability—is a whole different kind of infrastructure challenge.

Few people understand this better than Lin Qiao. Over the last decade, she’s led some of the most important ML teams in the world. At Meta, she and her team built PyTorch into one of the most widely used frameworks in AI. Before that, she worked on large-scale ML systems at AWS and Microsoft, focusing on training, tuning, and deployment at scale. She understands deeply that it’s one thing to build a model, and another to get it working in the real world.

I first met Lin two years ago when I was at MongoDB and looking to connect with thought leaders in the AI ecosystem. She told me about her vision for Fireworks, an inference platform that helps teams run open-source models in production with the performance, flexibility, and reliability needed for real-world applications. Alongside co-founders Dmytro Dzhulgakov, Dmytro Ivchenko, and James Reed—all of whom worked with her on PyTorch—as well as Benny Chen, Chenyu Zhao, and Pawel Garbacki, she’s built a team with deep technical roots and a clear mission: to deliver the fastest inference infrastructure for generative AI.

Fireworks already powers some of the most ambitious AI-native products in the market. These include high-throughput, latency-sensitive applications at companies like Uber, DoorDash, Notion, Quora, and Upwork. At the same time, it’s helping enterprise leaders like Samsung accelerate their AI roadmaps without rebuilding infrastructure from scratch. Teams can plug into Fireworks for fine-tuning, hosting, model optimization, and serving—everything they need to run open models in production, right out of the box.

That’s a huge unlock at a moment when the model landscape is shifting fast. Gartner projects GenAI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028, with much of that growth driven by specialized and fine-tuned models. Teams aren’t just calling APIs anymore; they want control, performance, and personalization. They want models that reflect their product and users. And they need infrastructure that can keep up.

The open-source movement is accelerating that shift. There’s now a thriving ecosystem of performant models—Mistral, Kimi, DeepSeek, and hundreds more—many of which rival the quality of closed systems. Cursor’s fine-tuned models already outperform GPT-4 on specific programming tasks. Perplexity runs its retrieval engine entirely on open weights. Even enterprise companies like Notion are embracing the trend, citing significant gains in latency and cost after switching from general-purpose APIs to specialized models.

But with more choice comes more complexity: different models, providers, and deployment environments to manage. Most teams don’t want to become model ops experts. They just want their app to work and keep getting better. The challenge lies in balancing quality, speed, and concurrency—what Lin calls the “future scaling law” of inference, where optimizing for one often means sacrificing another. Fireworks helps teams co-optimize across all three, without rebuilding their infrastructure from scratch. It supports a wide variety of models and modalities, runs across multiple clouds and regions, and continuously tunes for speed and cost. For teams deeply integrating AI into their products, that flexibility makes a huge difference.

And this isn’t just for startups on the bleeding edge. As generative AI becomes table stakes across every software category, even traditional enterprises will want to deliver highly differentiated, premium experiences. But most won’t have the time, budget, or technical resources to build that infrastructure themselves. Fireworks lowers the barrier to entry. It’s infrastructure that works just as well for the hundredth app as it does for the first.

Two years ago, I was fortunate to invest in Fireworks through the MongoDB Venture Fund. Since then, I’ve had the chance to see Lin’s rare combination of credibility, intensity, and execution up close, as she’s built Fireworks into one of the most exciting infrastructure companies in AI. Now, I’m proud to double down—this time with Index, a firm that’s long believed in the power of infrastructure to reshape markets. We’ve seen it with Confluent, ClickHouse, and Temporal, and we believe Fireworks is on that same path, turning high-performance inference into a new foundation for modern software.

From all of us at Index, a big welcome to Lin and her outstanding team. We believe inference is the new runtime—and Fireworks is building the engine that will power it.

In this post: Shardul Shah, Mark Xu

Published — Oct. 28, 2025