Adaptive testing for AI confidence

Written by

Alex Gutow

While confidence may be hindering AI progress at your organization, there is hope. The answer lies in rethinking testing. You can no longer rely solely on the limited and static performance-based tests that have worked for teams in the past. Instead, you need to move to adaptive testing to support production-grade AI applications at scale.

Adaptive testing leverages behavioral distributions to create a comprehensive definition of your app’s desired behavior that you refine over time. This enables you to quantifiably test and investigate when there are deviations from that desired behavior. Ultimately you are able to define a desired application state with these behavioral tests, understand as this state evolves, and be able to adapt the tests to reflect this changing state.

But what does it take to get there? Let’s take a closer look at the three steps necessary to bring adaptive testing to your AI apps.

‍

The three steps (and continuous adaptive process) of bringing adaptive testing to your AI apps.

Step one: Define desired behavior

First, you need to quantify, define, and test the behavior of your AI apps, so that you can continuously understand how these apps behave in production both today and in the future.

Comprehensively quantifying behavior requires taking into account a richer, more complete set of your application’s attributes to test against. The more attributes, the better the understanding of desired behavior. And the better that understanding, the more likely it is you’ll know when there’s a change to that behavior. Given the breadth of coverage necessary, having an adaptive testing solution to help you automate this is key. It should be able to leverage existing log data or golden datasets for your app to automatically generate a robust set of statistical distributions of these attributes, resulting in a complete, unique fingerprint of your app’s behavior. This allows you to easily define what behaviors or behavioral changes you want to be alerted to over time and explicitly test for them. And then adapt your behavioral tests to best align with your desired behavior over time.

Step two: Understand changes in behavior

With AI, change is inevitable. So once you’re able to test whether your app is behaving within your definition of desired behavior, next you need to understand the changes that led to this. What actually changed? What was the impact on behavior? And what caused the change?

With adaptive testing, you get the depth necessary to quickly investigate any changes to behavior and decide what action to take. When you see there’s been a change to your app’s behavior, you’re able to dig into the exact attributes that changed and pinpoint the specific results that most contributed to this change. Adaptive testing maintains this lineage of insights at every layer of your app, even with increasing complexity or scale, to give you a deeper understanding of what’s changing and its cascading impact. Armed with that, you’re able to shift from passive observation to insights you can action upon. You’re able to fully assess whether this change is acceptable or whether there’s an issue that needs to be resolved. For instance, if the behavioral change is due to a shift in production usage, you may choose to adjust your tests to better represent this new state of desired behavior. But if it is related to an issue due to model drift, you’re able to share the results of your root cause analysis with your development teams and work with them to resolve the issue with minimal impact to production.

Step three: Continuously improve with these changes in behavior

Finally, you’re able to continuously improve your app and overall development lifecycle with these changes in behavior. For any newly discovered change to behavior, you’ll either update your tests to encompass this behavior if it is acceptable or update your app to address the behavior if it is undesired. Then you’ll continue with testing your app’s behavior—either running the new set of tests that now represent desired state, or testing whether your updates to the app adhere to desired state. Thus, adaptive testing becomes the way you can get a snapshot of behavioral state for an AI app and continuously confirm it is what is desired over time.

By automating this process, you free up time to develop and push new updates to your apps, all with the confidence that it won’t impact production behavior. You can test and assess the impact of swapping to new models or adding new components and functionality, while leveraging the same suite of adaptive tests and definition of behavior. By having this persistent and shared definition of behavior, your team can also collaborate more efficiently across the development lifecycle to ensure the resulting apps can keep pace with the changing market needs, and spend time creating the AI apps that will truly move the needle for your business.

Adaptive testing is the missing piece for AI

Adaptive testing can help your team mature both with how you develop AI apps but also with the types of applications that make it into production—and stay in production.

But as with any technology solution, it cannot be yet another silo. This is a critical backbone for your overall AI technology stack. To get the full benefits, an adaptive testing solution needs to integrate into your existing processes and technologies. Such as integrating with the platforms that already house your application logs and existing eval metrics, or hooking into the observability and alerting tools already in place. These will help you get started faster and minimize disruption to your team.

The right solution must meet your team where you are today and act as a catalyst to help you safely and confidently scale development and usage. To learn more about Distributional’s adaptive testing solution and see if it’s the right fit for you, reach out to us here.

‍