A/B Testing Machine Learning Models in Production Using Amazon SageMaker (Discussion)
Kieran Kavanaugh, David Nigenda, and I, recently wrote a post for the AWS Machine Learning Blog about A/B Testing ML models in production using Amazon SageMaker. I recommend reading the post, and also checking out our accompanying Jupyter notebook (A/B Testing with Amazon SageMaker).
In this post, I wanted to add context to our AWS blog post, by sharing a high level design diagram of a potential real-time inference production machine learning workflow.
A Potential Real-time Inference ML Workflow
Notice that the ML inference service has multiple models that can be used to service an inference request to its endpoint. The question then is, what logic does the service use, to route an inference request to a specific ML model.
In the Amazon SageMaker context, a ProductionVariant “identifies a model that you want to host and the resources to deploy for hosting it” (straight from the docs!). In our post, we discuss how with SageMaker endpoints hosting multiple ProductionVariants, users can (1) specify how traffic is distributed between ProductionVariants and their corresponding models using a weighted random approach, or override this default traffic distribution behavior and (2) explicitly specify which ProductionVariant and corresponding model should service an inference request.
This flexibility opens up the possibility of A/B testing a new ML model with production traffic in various ways, thereby adding an effective final step in the validation process for a new model.