Learning new concepts
Ideally, whenever you’re learning a new technology, you’d love avoid shooting yourself in the foot. Start small, follow tutorial, break things, see what happens.
If a concept can be learned locally on your laptop, the worst you could do is hang your OS, overheat your GPU and waste some time.
On the other hand, if the concept can be learned only on AWS… things could go wildly different. I’m sure you’ve read many horror stories already.
One of the main concerns regarding cloud in general is pricing. How to ensure that cost doesn’t go up and you don’t wake up to a large bill? This is especially important for people learning the AWS cloud.
SageMaker Studio and its Running Instances tab
Setting “generic” AWS cloud aside and focusing solely on Amazon SageMaker - your first steps using it will inevitably lead to SageMaker Studio. A handy JupyterLab fork that becomes your go-to tool to interact with SageMaker.
You download a tutorial code, start a notebook instance, run that code and experiment with the platform. On the left, a super-friendly “Running instances” tab can be seen:
What many of beginners (wrongly) expect is that this tab at all times shows all the instances you’re running in SageMaker.
It doesn’t work like that! Really.
What you see in the Running Instances tab are only the instances bound to SageMaker Studio. Most of the time these will consist of instances needed to run your Notebooks (and some SageMaker Studio services such as Data Wrangler or Debugger).
For example, instances run through this code (the sklearn_processor.run()
method):
won’t appear in the Running Instances, because these resources aren’t bound to SageMaker Studio. The same could be said about Training Jobs or Inference Endpoints.
This is a consequence of the fact that SageMaker is not only a Jupyter notebook, but a set of many services that you can interact with. You’re just using SageMaker Studio Notebook (one of many ways) to interact with the entire SageMaker platform. I’ve already wrote a short post on this matter:
Therefore, this tab is not the only place you need to look at to determine “what stuff is currently running inside SageMaker”. It only shows you “what stuff is currently running inside SageMaker Studio”.
What you need to do to be absolutely certain, is to browse other services in the regular UI in the AWS console, for any long-running resources that might be left behind:
Usually, it’s an Inference Endpoint that you forgot about or a long running Training job or a Processing job.
I agree that this is quite cumbersome (no way to see them all in one place), but once you get a grasp of SageMaker, the chances that you “forgot about something” are lowered drastically. You’re juggling between several services at most.
Checking Inference Endpoints usually does the trick.
Data Transfer IN for SageMaker Inference is not free
Another well-known-fact-that-might-not-be-that-well-known-at-all is the DATA IN pricing of SageMaker Inference.
When you learn AWS, what you might sometimes hear is that “data transfer in is free, data transfer out is not”. Yes, most of the services you’re using (such as S3) follow that rule. However, SageMaker Inference (Real-time and Asynchronous) does not!
Here’s its pricing at the time of writing in the eu-west-1 region:
As you can see, not only you’re billed for data coming out of SageMaker but also data coming in. Take that into account.
Takeaways
SageMaker Studio doesn’t show Processing Jobs, Training Jobs or Inference Endpoints in its Running Instance jobs.
SageMaker Inference (Real-time and Asynchronous) bills for data transfer in.
Also, this is your monthly reminder not to use your AWS root account and set up billing alarms accordingly. 🤗