The rise of AI platforms

“This is the most SigOpt-like solution ever,” said the AI executive of a Global 2000 energy company, referring to the AI optimization startup we built and sold to Intel in 2020. “You’ve started with the hardest technical problem in the GenAI stack and haven’t built out the other components I need,” he continued with a grin.

He proceeded to explain the components he had already built or planned to build for his internal AI platform. He had started out by hosting an LLM and releasing a simple version of it to a few teams to see what use cases were most prominent. Now that he had this initial information, he felt confident in standardizing his AI platform. “Not only do I feel confident in my AI platform now, but we need it. I can’t responsibly scale up usage anymore unless we put a platform in place.”

This is similar to dozens of conversations I’ve had since. Every team I talk to is building, has plans to build, or has already built an AI platform. In this post, I’ll provide a snackable summary of what I’ve learned so far about the latest platform trend.

ML AI platforms in 2018 2025

If this all sounds familiar, it should. In 2018, I sat down with Sam Charrington, founder of TWIML. Earlier that year, I had joined Scott Clark and the SigOpt team. “Every customer I talk to discusses integrating us with their ML platform,” I explained to Sam. He nodded along with a wry smile suggesting I was explaining something he already knew. Six months later, we sponsored his independent research that led to the ML Platforms guide, podcast series, and event. Six months after this, MLOps became an industry standard category describing this toolchain. That won’t be the last branding moment I miss.

You won’t be surprised that the critical elements in this stack focused on feature engineering, training, and serving. Feature stores, experiment tracking, orchestration, and monitoring were a few of the relatively stable categories that emerged and seeded incumbent startup winners. All of them were designed with training models as the critical workflow, and most had structured data in mind.

Once trained, these models were relatively stable. I had a recent partner tell me that when they were building XGBoost models for financial services companies, they’d retrain them on an annual basis at most. Generative AI broke this model.

Emerging AI platform components

Pre-trained generative AI models have taken industry by storm. Most obviously, these models don’t need training, which obviates a good chunk of the ML platform stack. These models just work, and tend to work well.

There has also been increasing standardization of components, dependencies, packages, and tooling that make it easier to create complex chains of AI components—agents or otherwise. AI today is less experimental, more engineered.

So what does this mean for AI platforms? GenAI is in high demand. These models are powerful, but unpredictable. They are also expensive to run. They can be hard to get to reliable performance, and even harder to understand. Most teams I’ve talked to are cobbling together some variety of these components:

Model and/or compute management, including token cost optimization or prioritization
Proxy to seamlessly plug in any LLM to an app with various forms of routing
Controls and permissions by group, team, sometimes even individual
Orchestration, including hosting, inference, optimization, chaining
Observability, including traces, logs, metrics, evals
Testing, including consistency across all data types from all components and properties
User interface that creates a standard way for a user to access an LLM
Evals, prompt playgrounds, and other dev tools to iterate on application performance
Guardrails, firewalls, and other systems to pre-empt bad behavior or attacks

I’m certainly missing many components, and perhaps differently characterizing how you may think of some of these. One hero theme here, however, is that the goal of these platforms is to accelerate and scale generative AI productionalization—and ensure companies are allocating resources in the right ways. If done well, these platforms will result in greater return on AI investment.

This is a new stack. Even if they have ambitions of marrying the two together at various points, teams aren’t building off of their ML platforms. They are building new AI platforms to enable this workflow. And teams that get this right will be able to extract business value out of GenAI faster than the competition.

The case for best in class

So how do you get started? This space is moving so fast. Seemingly every day there’s a new entrant with a new take on one of these tools. This can be intimidating, confusing, and hard to navigate. Amidst this chaos, it can be tempting to try to use an end-to-end stack that has 50% of these components and does them at 80%.

This is fool’s gold. You start faster, but quickly run into problems as you scale. What started out easy, becomes incredibly hard as you shift resources from building new apps to debugging the underlying platform—and layers of APIs that support it. The teams I’ve seen moving fastest on generative AI experiment quickly to define what works and what doesn’t. But then they shift to a more reliable, scalable, and controllable stack as they scale. This approach gives them a few benefits:

Build and buy: Taking this approach makes it easier to prioritize which components your team will build as part of your core stack that you control, and which you will buy. And puts you in a position to build MVP versions of these components, and upgrade them with vendor solutions as the market becomes clearer with winners and losers.
Integrate without lock-in: This approach takes a bit more investment, but gives you these high-end capabilities without a threat of lock-in. A leading vendor today may be gone tomorrow. New innovation in the underlying models may result in new needs on the platform side. This approach also allows you to build on the prior work of your team, with the ability to plug in outputs from past solutions as the space evolves.
Standardization and customization: There isn’t a template for what these platforms should look like. You need to standardize how this is done internally, but your needs may be different than someone else. Building it up yourself with best-in-class components gives you the chance to have the best of both of these worlds—standardization and customization.

Put together, this suggests a best-in-class approach that doesn’t lock you into something that doesn’t scale or evolve with your needs over time.

How to get started

Inertia can be hard to overcome. Our own team has experienced short-term paralysis looking at various options for our own internal GenAI efforts. The best advice I have is to get started as fast as possible and enable the usage of at least a single GenAI model for various applications, whether an API endpoint or self-hosted. Maybe take an existing NLP component and upgrade it with an LLM. These can give you good initial data on potential usage. And once you have underlying app data, you can begin to build out an understanding of how these apps will work.

That’s where we come in. In his recent TWIML interview, Capital One AVP of AI/ML Enterprise Platforms Abhijit Bose says, “Observability…becomes very, not just important, but also very complex in the LLM world.” I agree.

It is never too early to get testing infrastructure in place—start small and scale it up as you grow. Reach out to start the conversation with us today.