Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) that enables data scientists and developers to perform every step of the ML workflow, from preparing data to building, training, tuning, and deploying models.

To access SageMaker Studio, Amazon SageMaker Canvas, or other Amazon ML environments like RStudio on Amazon SageMaker, you must first provision a SageMaker domain. A SageMaker domain includes an associated Amazon Elastic File System (Amazon EFS) volume; a list of authorized users; and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.

Administrators can now provision multiple SageMaker domains in order to separate different lines of business or teams within a single AWS account. This creates a logical separation between the users, files storage, and configuration settings for various groups in your organization. As an example, your organization may want to separate your financial line of business from the sustainability research division, as shown in the following multi-domain console.

domains

Creating multiple SageMaker domains also allows you to granularly set domain-level configurations such as VPC configurations in order to permit public internet access for some groups’ research, while enforcing that traffic goes through a specified VPC for business units with greater restriction.

Automated tagging

In addition to separating users, file storage, and domain configurations, administrators can also separate SageMaker resources that are created within their domain. By default, SageMaker now automatically tags new SageMaker resources such as training jobs, processing jobs, experiments, pipelines, and model registry entries with their respective sagemaker:domain-arn. SageMaker also tags the resource with the sagemaker:user-profile-arn or sagemaker:space-arn to designate the resource creation at an even more granular level.

Cost allocation

Administrators can use automated tagging to easily monitor costs associated with their line of business, teams, individual users, or individual business problems by using tools such as AWS Budgets and AWS Cost Explorer. As an example, an administrator can attach a cost allocation tag for the sagemaker:domain-arn tag.

cost allocation tags

This allows them to utilize Cost Explorer to visualize the notebook spend for a given domain.

AWS cost management

Domain-level resource isolation

Administrators can attach AWS Identity and Access Management (IAM) policies that ensure a domain’s user can only create and open SageMaker resources that are originating from their respective domain. The following code is an example of such a policy:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "CreateRequireDomainTag", "Effect": "Allow", "Action": [ "SageMaker:Create*", "SageMaker:Update*" ], "Resource": "*", "Condition": { "ForAllValues:StringEquals": { "aws:TagKeys": [ "sagemaker:domain-arn" ] } } }, { "Sid": "ResourceAccessRequireDomainTag", "Effect": "Allow", "Action": [ "SageMaker:Update*", "SageMaker:Delete*", "SageMaker:Describe*" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/sagemaker:domain-arn": "arn:aws:sagemaker:<REGION>:<ACCOUNT_ID>:domain/<DOMAIN_ID>" } } } ]
}

For more information, see Multiple domains overview.

Backfilling existing resources with domain tags

Since the launch of the multi-domain capability, new resources are automatically tagged with aws:ResourceTag/sagemaker:domain-arn. However, if you want to update existing resources to facilitate resource isolation, administrations can use the add-tag SageMaker API call in a script. The below example shows how to tag all existing experiments to a domain:

domain_arn=arn:aws:sagemaker:<REGION>:<ACCOUNT_ID>:domain/<DOMAIN_ID>
experiments=`aws --region $REGION \
sagemaker list-experiments`
for row in $(echo "${experiments}" | jq -r '.ExperimentSummaries[] | @base64'); do _jq() { echo ${row} | base64 --decode | jq -r ${1} } exp_name=$(_jq '.ExperimentName') exp_arn=$(_jq '.ExperimentArn') echo "Tagging resource name: $exp_name and arn: $exp_arn with \"sagemaker:domain-arn=$domain_arn\"" echo `aws sagemaker \ add-tags \ --resource-arn $exp_arn \ --tags "Key=sagemaker:domain-arn,Value=$domain_arn" \ --region $REGION` echo "Tagging done for: $exp_name" sleep 1
done

You can verify that any individual resource was correctly tagged with the following code sample:

aws sagemaker \
list-tags \
--resource-arn <SAGEMAKER-RESOURCE-ARN> \
--region <REGION> 

Solution overview

In this section, we outline how you can set up multiple SageMaker domains in your own AWS account. You can either use the AWS Command Line Interface (AWS CLI) or the SageMaker console. Refer to Onboard to Amazon SageMaker Domain for the most up-to-date instructions on creating a domain.

Create a domain using the AWS CLI

There are no necessary API changes from the previous aws sagemaker create-domain CLI call, but there is now support for --default-space-settings if you intend to use shared spaces in SageMaker Studio. For more information, see shared spaces in Amazon SageMaker Studio.

Create a new domain with your specified configurations using aws sagemaker create-domain, and then you’re ready to populate it with users.

Create a domain using the SageMaker console

On the updated SageMaker console, you can administer your domains via the new option called SageMaker Domains in the navigation pane.

Here you’ll be presented with the options to open existing domains, or create a new one using the graphical interface.

create domain

Conclusion

Utilizing multiple SageMaker domains provides flexibility to meet your organizational needs. Whether you need to isolate users and their business groups, or you want to run separate domains due to configuration differences, we encourage you to stand up multiple SageMaker domains within a single AWS account!


About the Authors

Sean MorganSean Morgan is an AI/ML Solutions Architect at AWS. He has experience in the semiconductor and academic research fields, and uses his experience to help customers reach their goals on AWS. In his free time, Sean is an active open-source contributor/maintainer and is the special interest group lead for TensorFlow Add-ons.

Hyperedge- . IoT, Embedded Systems, Artificial Intelligence,Arkaprava De is a Senior Software Engineer at AWS. He has been at Amazon for over 7 years and is currently working on improving the Amazon SageMaker Studio IDE experience. You can find him on LinkedIn.

Hyperedge- . IoT, Embedded Systems, Artificial Intelligence,Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the IDE of choice for all ML development steps. In his spare time, Kunal enjoys skiing and exploring the Pacific Northwest. You can find him on LinkedIn.

Hyperedge- . IoT, Embedded Systems, Artificial Intelligence,Han Zhang is a Senior Software Engineer at Amazon Web Services. She is part of the launch team for Amazon SageMaker Notebooks and Amazon SageMaker Studio, and has been focusing on building secure machine learning environments for customers. In her spare time, she enjoys hiking and skiing in the Pacific Northwest.

Read more about this on: AWS