The modularity of Amazon SageMaker

or how you can slowly change services and migrate away if necessary.

Feb 15, 2023

My last post argued that, while core SageMaker services are basically managed container services, their pricing should be compared with extra caution to generic-purpose managed container services such as EKS or ECS.

MLOps and how you tame it

What is SageMaker Processing, SageMaker Training and SageMaker Inference?

Three of the most important services of Amazon SageMaker are actually pretty simple to understand, even if you're outside ML bubble. If you ever ran a container, you're almost there. My previous post established (among other stuff such as “it’s not just Jupyter”), that…

2 years ago · 3 likes · Tomasz Dudek

This is because in real world ML, you’ll inevitably build a system that has components not only to train and deploy models but also to store experiments, save models, monitor them or orchestrate the training process. Thus, a potential comparison has to take the entire ecosystem into account.

Additionally, I’ve mentioned that SageMaker is a highly modular platform.

What do I mean by that?

The modularity explained

Amazon SageMaker is not uniform. The platform has been built from the grounds up and in 2017 there were only a couple of services available - Processing, Training, Inference and managed Jupyter notebooks. Over time, as ML branch grew in maturity and MLOps term was coined, more and more capabilities have been added to the platform.

Some of them are now mature and nearly feature complete technologies, while some are merely rushed MVPs, rolled out to the public to get feedback and feature requests ASAP. Thus, the quality of SageMaker services varies a lot.

Thankfully, the platform makes it easy to build a system that consists not only SageMaker services, due to two main reasons:

the extensive Amazon S3 usage between SM services, serving as intermediate steps/data placeholders
ecosystem of plugins/libraries of 3rd party vendors/open source solutions that integrate with SageMaker

Every SageMaker service above at some point could be swapped for another technology.

Such “connector” probably already exists and is maintained both by SageMaker team and 3rd party vendor. After a year or two, your system might become this:

The “vendor lock” is still there, but the migration process is actually quite simple.

Why would anyone migrate away?

While I usually praise SageMaker and genuinely believe that it is the right platform for most of us, there are many issues that might occur and may lead teams to partially move away from a given SageMaker service.

A non comprehensive list of reasons may contain:

some functionalities missing → wasting time preparing costly workarounds
lack of heavy-customisation abilities → can’t use a given service because it doesn’t meet your needs
private deal struck by a 3rd party vendor → a price of a given component is way cheaper than that of a SageMaker equivalent
multi-cloud deployment attempt → SageMaker is partially unusable
cost of bare machines too high even when taking maintenance into account → going bankrupt
too much engineering around, expected a low code or no code, GUI-friendl solution → SageMaker is not that click-ops friendly, suggests to use real code instead

Share MLOps and how you tame it

All that, especially as your project and teams grow (as well as their bag of experiences with other tools), might force you to make a switch.

Example scenario

Let’s suppose you’ve built a project entirely on top of the SageMaker stack.

After several months, you’ve noticed a quiet rebellion among your data scientists. Turns out that SageMaker Experiments lacks functionalities they expected and doesn’t really meet their needs. What could you do about it?

Well, SageMaker doesn’t block you from using alternative solutions. You just give MLflow a shot, deploying it on your own EKS cluster.

Some time has passed, your product has grown and you’ve deployed multiple LLM models on SageMaker Inference. Other teams became Kubernetes-heavy and started comparing prices of SageMaker Inference versus using KServe. The math doesn’t add up!

What could you do about it? Once again, leverage your company’s Kubernetes skills and use Seldon Core instead.

Continuing that refactoring pattern, all of your SageMaker components may eventually get replaced in future.

Summary

You can freely replace SageMaker services with other components as your project grows and potentially outgrows SageMaker.

As usual, bear in mind all the implications that come with maintaining your own systems.

Data & AI on AWS and how you tame it

Discussion about this post