Switchback testing for decision models

Switchback testing is available on the Nextmv platform! We’re proud to expand our decision model testing suite – and we’re excited by the response from customers and community members. See the real-world impacts of a candidate model with a switchback test that randomly assigns units of production runs to each model.

What is switchback testing? We dove into the concept of switchback testing in detail in an earlier post. In short, it’s similar to A/B testing, but not quite the same. Switchback testing randomizes which model is applied to units of time and/or location to mitigate network effects (like pooled resources) so your team can compare a candidate model to a baseline model in a true production environment where the model is making operational decisions.

Nextmv makes it simple to kick off and analyze the results of switchback tests so you can quickly and confidently develop and deploy well-tuned models for your production use case.

To kick off a switchback test, all you need is:

  • Name of the experiment
  • Description (optional)
  • Baseline instance (likely the model that’s currently in production)
  • Candidate instance (the model you’d like to test against the production model)
  • Total number of units to use in the test (that will be randomly assigned to each model)
  • Length of unit (in minutes)

Before diving into the metrics, the results page will also display the Plan Summary that provides a unit index to easily identify the experimental units and which treatment was applied to them. Here we can see that the unit duration is 60 minutes, a common time frame to use for a unit in a switchback test. In the full plan summary, we can see that it will run for a full week. Often these tests run for a few days or weeks.

After the test is completed, it’s time to check out how the candidate model fared. Is the candidate model ready to be promoted to production? Let’s take a look at KPIs, including summary metrics like solution value and custom metrics per use case like unplanned stops for routing. In the results below, we can see that the candidate model (staging) had fewer unplanned stops in production. We expect that fewer unplanned stops will increase broader metrics like customer satisfaction that can only be seen when a model is run in an operational environment.

Once you’ve reviewed the results of your switchback test, you may decide to make further changes to your candidate model. In that case, you’ll likely run it through another set of tests to see how the updates affect your KPIs.

If you’re happy with the results, you can push your new model to production directly from the Nextmv console. Simply switch your production instance to use the new version of your code. (This can also be done via the Nextmv CLI.)

Learn more in the blog post and demo video