Five Best Practices for Security in Amazon SageMaker

| Author , tagged in Security, machine learning
Cloudticity, L.L.C.

Machine learning is rapidly transforming the healthcare industry and the role that technology plays in patient care. But these innovative new models can’t just run on our laptops, they have to be hosted in an environment where they can be powered by cutting-edge processors and deep memory banks. 

As luck would have it, this is exactly what Amazon Web Services’ SageMaker provides. SageMaker has emerged as a popular, powerful way to build, train, fine-tune, and update models, in purpose-built environments that make managing the machine learning lifecycle much easier.

But there’s a catch: if you operate in the healthcare sector or other highly regulated sector, you’ll need to know how to do this without exposing sensitive data or running afoul of data sharing laws. 

This piece will dive into best practices for securing SageMaker, arming you with the facts you need to safely and successfully apply the latest in machine learning to improve the healthcare space.

What is SageMaker?

Amazon’s SageMaker is a comprehensive, managed machine learning (ML) offering that allows you to plug your models directly into an easily configurable host environment. This removes the need to build servers of your own or spend hours writing bespoke specifications.

What this means for developers is gaining the freedom to focus on thinking and writing code, with SageMaker working to remove many of the tedious underlying details. Not only does this enhance their productivity, but it tends to be the kind of thing developers look for in new tools. That said, it’s always good to do your due diligence and ensure you’re taking the necessary security precautions when relying on any external solution, like AWS. 

That will be the subject of the next few sections. 

Read the Blog: 10 Biggest AI/ML Security Threats to Look Out For

Best Practices for Security in SageMaker


Network & Application Protection

Amazon offers what’s called a Virtual Private Cloud, which allows you to access your AWS environment in an isolated virtual network. Think of this as analogous to a larger-scale version of a Virtual Private Network (VPN) that you might use at your job, at school, or when you’re in a foreign country.

The VPC is essentially an outer shell that your entire project lives within. To properly protect your models and data, make sure that all of your notebooks, models, data, and all other work is being conducted inside a VPC, where it’s safe. 

Another aspect of network protection is domain management. When you set up a domain and provision a user, they are given their own account with their own workspace, and they receive an additional fraction of a shared space. By setting the permissions on these accounts, you can carefully manage who has access to what, and how data can be shared between parties. Even better, all of this is still happening within the VPC that you've designated for this particular project.

Whenever possible, make sure you’re using a private link for VPC endpoints and disabling internet access. If you allow the VPCs to be accessed from the internet you will be exposing yourself to additional security risks.

Authentication and Authorization

There may be instances where you don’t want certain workloads to be able to access your SageMaker resources. AWS’s Identity and Access Management (IAM) solutions can make this process manageable. Think of IAM as the “master key” that’s in your sole possession, and you can hand out individual “door keys” to as many other individuals or third-party technology solutions as required. This is all part of a concept known as “least privilege”, which refers to the belief that an entity should only have access to the bare minimum amount of information it needs to complete a task. Least privilege is a common and effective way of reducing your attack surface and your likelihood of data leaks. 

Another great security option is multi-factor authentication, sometimes called 2FA. This is quickly being embraced by everyone from nontechnical laymen up to enterprises operating at a massive scale – and that’s because it’s one of the simplest and most effective ways you can ensure your data is protected in SageMaker.

Data Protection

AWS offers encryption services, which are important when you’re dealing with notebooks and sensitive healthcare data. You can set up encryption through AWS’s Key Management Service (KMS), which can then be used by SageMaker to make sure data is secured. Additionally, you can ensure that certain notebooks only have access to certain data sets, which is especially helpful if you’re trying to compartmentalize patient data.

Pulling back a bit, it’s considered a best practice to refrain from including sensitive information in text fields and forms – including when you work with Amazon SageMaker or other AWS services using your console, API, AWS CLI, or AWS SDKs. Any data that you enter into tags or free-form text fields may be used for billing or diagnostic logs. If you provide a URL to an external server, do not include credentials information in that URL, as those too could be exposed.

If you’ve been using the internet for a long time you might remember the days when it was common to see the communication protocol “http://” before a web address. Today, the gold standard is the more secure HTTPS protocol, so make sure you’ve enabled in SageMaker. HTTPS uses both the secure sockets layer (SSL) and the transport layer security (TLS), and you should be using both when you’re connecting anything in SageMaker.

Threat Detection & Incident Response

When bad actors attempt to strike, you want to be ready.

Monitoring is one of the best ways to keep an eye on any threats that might be knocking at your door, and you can do that through AWS’s CloudTrail. CloudTrail isn’t just your trusty recordkeeper, it’s also a great way to audit your models and ensure they're in compliance because CloudTrail records all actions taken by users or AWS services.

We all learned that practice makes perfect in grade school, so it’s a good idea to plan for threats by creating some scenarios for detection. These also afford you the opportunity to send out test alerts to make sure you’re prepared to catch any potential threats.

Amazon GuardDuty is one example of a robust service that can detect threats to your AWS systems. By using GuardDuty you can answer questions like “Are the automation events happening?” and “Are the threats being detected?” This way, you can identify weak points and shore them up before a real danger rears its head. 

Compliance Certifications

Every industry has regulatory standards, and cloud solutions are no different. For this reason, cloud compliance is a must-have, and AWS’s SageMaker supports over 143 security and compliance standards. When using SageMaker, it’s important to ensure you’re meeting all the necessary compliance standards, not just to play by the rules, but also to ensure you’re doing everything possible to keep your environment (and your healthcare data) secure.

By using AWS’s Compliance offering, you will be able to reduce the hassle of trying to check all the myriad accreditation criteria on your own. However, it’s worth noting that when using AWS Compliance, the responsibility of compliance is jointly shared between AWS and the customer, so be sure to understand the shared responsibility model and uphold the compliance responsibilities that fall on you. 

How are SageMaker and BedRock Different?

Though we’re focused on SageMaker security in this piece, it’s worth briefly talking about how SageMaker is different from a relatively new service, Amazon BedRock.

BedRock is another AWS solution that works in tandem with SageMaker. As its name implies, the value of BedRock lies in its ability to create so-called “foundation models”. A developer using BedRock can easily create and train generative AI applications, such as large language models. BedRock is also serverless, which allows for the private development of models. This does mean that you will need to link it to your SageMaker notebook(s), but Amazon won’t use the data you create in BedRock to train its own models or the models of any other entity.

SageMaker, on the other hand, is fully managed and makes it easy to host your models. Think of it like a full kit, and BedRock as simply the foundation.

Even though BedRock is serverless, security precautions still need to be put in place.

AWS operates on a shared responsibility model, which means that AWS protects and defends the entire AWS Cloud and its information, so you won’t need to worry about having two separate security solutions if you use both BedRock and SageMaker.

The other security measures we covered earlier, including multi-factor authentication, encryption, and activity logging through CloudTail, can all apply to BedRock as well. Where proprietary healthcare data is concerned, be aware that BedRock doesn’t save customer information. This includes prompts, responses, or any of the information used for fine-tuning. 

Next Steps

Technologies like artificial and machine learning are changing healthcare for the better, but special effort has to be made to secure them and avoid problems down the line. Armed with these best practices, you have the resources to double down on security in your SageMaker environments. 

Want to get started using machine learning in healthcare? Download the FREE eBook

Getting Started with Generative AI in Healthcare.

Or schedule a FREE consultation to learn how we can help.

getting started with Generative AI Gen AI eBook

TAGGED: Security machine learning

Subscribe Today

Get notified with product release updates and industry news.