Looking for a thorough guide on building rock-solid machine learning powered systems? Machine Learning Lens of the AWS Well-Architected Framework has got you covered.
What is AWS Well-Architected Framework?
AWS Well-Architected Framework (WAF) is a complete guideline on designing systems on AWS. It consists of 6 pillars:
operational excellence - how to run and manage a cloud system
security - how to secure it
reliability - how to make it scalable
performance efficiency - how to get the most out of the cloud
cost optimisation - how not to bankrupt
sustainability - how not to kill Earth in the process
Basically - hundreds of pages filled with best practices in each of these areas.
What are the WAF Lens?
The main framework and white paper is great, but also very generic. It does not dive to deep into quirks of various specific IT branches or technologies - it only provides general purpose insights that can be applied to any cloud-based system.
That's why they have introduced the WAF Lenses - white papers basing on the original principles and pillars but focusing on specialised areas such as game development, IoT, HPC, serverless or machine learning.
How are the ML Lens structured?
The white paper starts with identifying the usual machine learning lifecycle - from business goal definition, through data exploration and model building to deployment and monitoring. Then every phase is explained in details and examined in terms of the WAF pillars.
This in turn, brings us hundreds of best practices such as:
MLPER-01: Determine key performance indicators, including acceptable errors
MLOE-06: Establish feedback loops across ML lifecycle phases
MLPER-05: Use a data lake house architecture
MLOE-09: Automate operations through IaC and CaC
MLREL-10: Automate endpoint changes through a pipeline
MLREL-13: Ensure a recoverable endpoint with a managed version control strategy
Every such point is then explained and contains links to complementary documents, videos, tutorials or blog posts if you wish to dig in even deeper.
In a way, this could also be seen as the introductory-but-detailed MLOps white paper. Or your “best practices checklist”.
Is it AWS only?
While all the implementation examples (obviously) point to various SageMaker services and AWS resources, most of these best practices are cloud & technology agnostic. You will definitely use them not only in AWS but in any environment.
Additionally, if you're a MLOps expert but your toolkit consists of different tools or clouds, this document provides a great overview of all SageMaker capabilities.
Where can I find it?
ML Lens was created and is maintained by AWS and available online free of cost here.
Beware!
Keep in mind that this ML Lens is a very comprehensive resource and a very long list. You don’t need to apply all of the best practices at once as you probably don’t need all of them. Find a balance and implement them gradually as various needs in your projects arise. Basically - perform MLOps at a reasonable scale (shout out to neptune.ai for coining that term), tailored to your use case.