What is SageMaker Processing, SageMaker Training and SageMaker Inference?
Why "just using ECS instead" is not always a good idea?
Three of the most important services of Amazon SageMaker are actually pretty simple to understand, even if you're outside ML bubble. If you ever ran a container, you're almost there.
My previous post established (among other stuff such as “it’s not just Jupyter”), that Amazon SageMaker is essentially a remote, ML-aware toolkit that you send commands to.
Turns out that in case of SageMaker Processing, SageMaker Training and SageMaker Inference, these commands actually tell AWS to run a container on your behalf to do something:
It's really simple as that!
SageMaker Processing TL;DR is as follows:
runs a container powered with data processing libraries (such as numpy, Pandas, Spark)
loads data from S3
processes it
saves processed data back to S3
SageMaker Training on the other hand:
runs a container powered with ML training libraries (such as PyTorch, HuggingFace, Sklearn)
takes training data from S3
trains an algorithm
saves the model back to S3
Finally, SageMaker Inference:
launches a container (a model server) powered with a model server with a load balancer included
loads the model from S3
continuously runs your container to serve predictions
However, the real devil is in the details and more advanced capabilities.
Where’s the benefit? Shouldn’t I just use ECS or EKS instead of Amazon SageMaker?
If you compare the prices between bare EC2 instances with ECS or EKS versus SageMaker instances, you’ll immediately see that on paper the EC2 instances are up to 25-30% cheaper. You’ may then ask - why would anyone use SageMaker, if all the core services are just ECS in disguise?
The answer is simple - while EC2 + ECS/EKS are generic container services and provide solely a raw compute power, SageMaker is a fully managed and fully-fledged, specialised ML service.
Modern ML projects require something more than only infrastructure to run your training and inference on. SageMaker Processing, Training and Inference aren’t the only services you’ll be using. There’s a plethora of other tools around that will be there too.
You’ll inevitably need a research environment for collaboration, experiments registry, models registry, an orchestration tool, ML-aware monitoring and more.
Thus, the comparison should never be:
SageMaker Processing and Training vs running processing and training on ECS/EKS
because it does not show the full picture of any production-grade ML project.
Instead what you should compare is:
SageMaker ecosystem vs running my own MLOps ecosystem on ECS/EKS:
A more realistic comparision.
In that case, the cost of building, configuring and maintaining that ecosystem should be taken into account. Amazon SageMaker is not only providing a compute power for your generic containers. It also provides this entire ecosystem of MLOps tools, often with a convenient pricing and simple integration between those components.
The 25% premium on SageMaker EC2 instances may suddenly become a bargain when compared to cheaper instances but maintained by a team of 5 engineers.
This is how an alternative to the Amazon SageMaker looks like in a larger project.
Where’s the catch?
Of course, as your project matures and your bills start growing up, some pieces of SageMaker might become either too limited or too costly at a very large scale. SageMaker is also generic (one size fits all) and for many reasons other tools might make more sense at some point.
In my opinion, the more mature in terms of ML and MLOps you are, the more sense it makes to move from managed services to running infrastructure and MLOps apps on your own.
Thankfully, Amazon SageMaker is a modular platform. It means that you don’t need to use everything from their services. You can swap the underlying SageMaker components for other services when a need rises.
However, for most people starting their journey, going all in with SageMaker spares them from managing infrastructure on their own, while still reaping all the MLOps benefits and features.
This is the crucial benefit.