From overhauls to GA releases: A look at Amazon SageMaker's updates of 2022
As we wrap up the year, let's take a look at the changes that have been made to Amazon SageMaker this year.
Here goes my non-exhaustive selection of SageMaker changes:
Q1 2022
Pipelines - pipeline definitions can be loaded from S3, EMR step introduced, concurrency control added
Training - g5 instance family (ones with NVIDIA A10G) is supported
Autopilot - you can generate a performance report (confusion matrix, AUC-ROC), datasets as large as 100GB are supported
Q2 2022
g5 instance family is supported in Notebook Instances and Studio
Autopilot - support for custom validation set, validation ratio and manual feature selection; more metrics for candidate models,
Feature Store - new features can be added to an existing feature group
Data Wrangler - Databricks is supported as a source
Ground Truth adds VPC support
RStudio on SageMaker allows custom docker images
Inference - Serverless Inference is now generally available!
Q3 2022
Training - added heterogenous clusters option to run multiple instance types in the same job, for example CPU heavy + GPU heavy; Warm Pools let you retain provisioned infrastructure for a while
Hyperparameter Tuning - bump from 20 to 30 maximum hyperparameters that can be searched; multiple alternate EC2 instance types can be selected; Hyperband search strategy introduced
Inference - g5, p4d (NVIDIA A100) and c6i instance families can be deployed; instance volume size can be customized; model download timeout and health check timeout are added
Edge Manager - ability to deploy a model on selected devices
Clarify - you can now explain real-time Inference endpoints
Pipelines - local mode supported; cross-account sharing supported
you can attribute user activity even when users share the same execution IAM role
Q4 2022
dozens of new re:Invent features and several new services, already covered here:
Training - Trainium instances (ml.trn1) is available
Model Monitor - now works with Batch Transform too
Inference - new instance types available (many Graviton families)
Experiments - a complete redesign!!
Hyperparameter Tuning - grid search strategy introduced; random seed can be set manually for better reproducibility
This list was non-exhaustive - where do I find all changes?
That’s simple - just go here and scroll down to “What’s new” section. That one contains everything.
Which changes were the most important for you?
For me, two particularly important changes were the SageMaker Experiments overhaul and the SageMaker Serverless Inference GA.
Can’t wait for 2023!