Skip to content

Commit ec1624b

Browse files
committed
Merge branch 'nist-800-223' into 'develop'
NIST SP 800-223 Recipe See merge request mwvaughn/aws-hpc-recipes!142
2 parents a0333a0 + be5f2bd commit ec1624b

25 files changed

+2753
-0
lines changed

recipes/pcluster/nist_800_223/.gitkeep

Whitespace-only changes.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Target rules
2+
all: build
3+
@echo "Building nist_800_223"
4+
5+
build: assets
6+
7+
assets:
8+
@echo "Build assets for nist_800_223"
9+
10+
run: build
11+
@echo "Run assets for nist_800_223"
12+
13+
test: build
14+
15+
clean:
16+
17+
clobber: clean
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Securing HPC with NIST SP 800-223 on AWS
2+
This recipe aims to instruct and guide users how to build a cloud HPC for the new NIST SP 800-223 standard using AWS Parallel Cluster.
3+
The National Institute of Standards and Technology (NIST) has published [NIST SP 800-223: High-Performance Computing (HPC) Security: Architecture, Threat Analysis, and Security Posture](https://csrc.nist.gov/pubs/sp/800/223/final). This new publication provides guidance on how to configure and secure a HPC cluster. This builder project aims to instruct and guide users on how to build a cloud HPC for the new NIST SP 800-223 compliance using AWS CloudFormation and AWS Parallel Cluster.
4+
5+
6+
- [Securing HPC with NIST SP 800-223 on AWS](#securing-hpc-with-nist-sp-800-223-on-aws)
7+
- [Overview](#overview)
8+
- [Architecture Overview](#architecture-overview)
9+
- [Architecture diagrams](#architecture-diagrams)
10+
- [Cost](#cost)
11+
- [Cost Table](#cost-table)
12+
- [Security](#security)
13+
- [Prerequisites](#prerequisites)
14+
- [SSH Access](#ssh-access)
15+
- [AWS account requirements (If applicable)](#aws-account-requirements-if-applicable)
16+
- [Deployment Steps](#deployment-steps)
17+
- [Default Stack Names](#default-stack-names)
18+
- [Deployment Validation](#deployment-validation)
19+
- [Next Steps](#next-steps)
20+
- [Login via SSM](#login-via-ssm)
21+
- [Login via SSH](#login-via-ssh)
22+
- [Cleanup](#cleanup)
23+
24+
## Overview
25+
26+
Amazon Web Services (AWS) provides the most elastic and scalable cloud infrastructure to run your hpc workloads. With virtually unlimited capacity - engineers, researchers, HPC system administrators, and organizations can innovate beyond the limitations of on-premises HPC infrastructure.
27+
28+
High Performance Compute (HPC) on AWS removes the long wait times and lost productivity often associated with on-premises HPC clusters. Flexible HPC cluster configurations and virtually unlimited scalability allows you to grow and shrink your infrastructure as your workloads dictate, not the other way around.
29+
30+
This guidance provides a comprehensive approach to deploying a secure, compliant, and high-performance HPC environment on AWS. It addresses the unique security challenges of HPC systems while maintaining the performance requirements critical for computationally intensive workloads.
31+
32+
We developed this guidance in response to the growing need for secure HPC environments in cloud settings. Many organizations, especially those in research, engineering, and data-intensive fields, require immense computational power but struggle to balance this with stringent security and compliance requirements. The NIST SP 800-223 publication provides an excellent framework for addressing these challenges, and we wanted to demonstrate how to implement these recommendations using AWS services.
33+
34+
## Architecture Overview
35+
36+
### Architecture diagrams
37+
38+
Architecture diagrams below show sample NIST 800-223 based architecture, provisoning and deployment process using cloudformation, HPC Cluster deployment, and user interactions via AWS ParallelCluster. Depending on the region you deploy the recipe in, it will automatically scale to from 2-4 AZs in order to maximize availability and redundancy of your cluster.
39+
40+
![Infrastructure](images/ref-arch.png "Reference Architecture")
41+
42+
### Cost
43+
44+
You are responsible for the cost of the AWS services used while running this Guidance. As of November 2024, the cost for running this Guidance with the default settings in the US East (N. Virginia) region is approximately $1,156 per month.
45+
46+
We recommend creating a [Budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.
47+
48+
### Cost Table
49+
50+
The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.
51+
52+
| Stack Name | AWS Services | Cost [USD] |
53+
| ----------- | ------------ | ------------ |
54+
| Network | VPC, Subnets, NAT Gateway, VPC Endpoints | $596.85/month |
55+
| Security | Security Groups | $0.00/month |
56+
| Storage | S3, EFS, FSx, EBS | $172.19/month |
57+
| Slurm Accounting | RDS Database | $73.84/month |
58+
| Active Directory | Managed AD (Enterprise) | $288.00/month |
59+
| Cluster | Head node, Login node | $25.00/month |
60+
61+
***Note: This focus of this Guidance is to provide an example of securing the underlying AWS services and infrastructure that an HPC cluster will eventually run on. It does not aim to include any costs related to running an actual HPC workload. Please use the [AWS Pricing Calculator](https://calculator.aws/) to estimate any additional costs related to your specific HPC workload usecase.***
62+
63+
### Security
64+
65+
When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This [shared responsibility
66+
model](https://aws.amazon.com/compliance/shared-responsibility-model/) reduces your operational burden because AWS operates, manages, and
67+
controls the components including the host operating system, the virtualization layer, and the physical security of the facilities in
68+
which the services operate. For more information about AWS security, visit [AWS Cloud Security](http://aws.amazon.com/security/).
69+
70+
[AWS ParallelCluster](https://aws.amazon.com/hpc/parallelcluster/) users can be securely authenticiated and authorized using [Amazon Manageged Microsoft Active Directory](https://aws.amazon.com/directoryservice/). HPC cluster EC2 components are deployed into a Virtual Private Cloud (VPC) which provides additional network security isolation for all contained components. Login Node is depoyed into a Public subnet and available for access via secure connections (SSH and SSM), Head Node is depoyed into a Private subnet and available for access via secure connections (SSH and SSM), compute nodes are deployed into Private subnet and managed from Head node via SLURM package manager, Slurm accounting database is deployed into a Private subnet and managed from the Head node using Slurm. Data stored in Amazon S3, Amazon EFS, and Amazon FSx for Lustre is [enrypted at rest and in transit](https://docs.aws.amazon.com/whitepapers/latest/logical-separation/encrypting-data-at-rest-and--in-transit.html). Access to other AWS services from AWS ParallelCluster components are secured over [VPC Endpoints](https://docs.aws.amazon.com/whitepapers/latest/aws-privatelink/what-are-vpc-endpoints.html) from a Private management subnet.
71+
72+
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
73+
74+
## Prerequisites
75+
76+
### SSH Access
77+
78+
If you prefer to use SSH to access the login node or head node you will need to create a new SSH keypair in your account ***before*** launching the ParallelCluster CloudFormation template.
79+
80+
To do that:
81+
1. Login to your AWS account
82+
2. In the search bar at the top of the screen type in EC2
83+
3. In the list of services select EC2
84+
4. In the left-hand menu select Key Pairs under the Network & Security section
85+
5. Click Create key pair
86+
6. Enter a key pair name
87+
7. Select your preferred key pair type and format and click Create key pair
88+
8. This will automatically start a download of the private key for the key pair you just created
89+
9. Save this key in a secure location (this key can act as your password to login to the nodes launched by this template)
90+
91+
### AWS account requirements (If applicable)
92+
93+
This deployment requires you have access to Amazon CloudFormation in your AWS account with permissions to create the following resources”
94+
95+
**AWS Services Used:**
96+
97+
- [Amazon VPC](https://aws.amazon.com/vpc/)
98+
- [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)
99+
- [Amazon Identity and Access Management (IAM)](https://aws.amazon.com/iam/)
100+
- [Amazon Elastic Compute Cloud (EC2)](https://aws.amazon.com/ec2/)
101+
- [Amazon Elastic File System (EFS)](https://aws.amazon.com/efs/)
102+
- [Amazon Elastic Block Store (EBS)](https://aws.amazon.com/ebs/)
103+
- [Amazon FSx for Lustre (FSxL)](https://aws.amazon.com/fsx/lustre/)
104+
- [Amazon Relational Database Service (RDS)](https://aws.amazon.com/rds/)
105+
- [Amazon Dirctory Service](https://aws.amazon.com/directoryservice/)
106+
- [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/)
107+
- [AWS Systems Manager](https://aws.amazon.com/systems-manager/)
108+
- [All service used by AWS ParallelCluster](https://aws.amazon.com/hpc/parallelcluster/)
109+
110+
## Deployment Steps
111+
112+
1. Clone the repo:
113+
- ```git clone https://github.com/aws-samples/aws-hpc-recipes.git```
114+
2. cd to the deployment folder inside the repo
115+
- ```cd recipes/pcluster/nist-800-223/```
116+
3. Locate the six Amazon CloudFormation templates and review them in order in a text editor of your choice or in the Amazon CloudFormation console
117+
- In most cases you will want to use the default settings, however, you have the ability to modify these templates to your specific needs
118+
4. Open a browser and login to your AWS Account
119+
5. Locate the search bar at the top of your screen and type in CloudFormation
120+
6. When presented with a list of services click CloudFormation to open the CloudFormation console
121+
122+
![CloudFormation](images/deployment_steps/0_deployment.png)
123+
124+
7. Click the Create Stack button
125+
126+
![Create Stack](images/deployment_steps/1_deployment.png)
127+
128+
8. In the "Prepare template" section select "Choose an existing template"
129+
130+
![Prepare Tempalte](images/deployment_steps/2_deployment.png)
131+
132+
9. In the "Specifiy teamplate" section select "Upload a template file"
133+
10. Click the "Choose file" button
134+
135+
![Choose File](images/deployment_steps/3_deployment.png)
136+
137+
11. Navigate to the location on your local computer where you clone the repo too and go to the deployment folder. There you will find the CloudFormation templates prefaced with a number that will indicate the order to execute them in.
138+
12. Select the first template titled "0_network.yaml"
139+
13. For each template you will be asked to provide a Stack name, this name must be a unique stack name for the region you are deploying in.
140+
141+
***Important: The stack name should be noted for use in later templates. Downstream services will need to know this stack name in order to reference Amazon Resource Names (ARNs) or resource IDs that will be exported/output for each template***
142+
143+
![Stack Name](images/deployment_steps/4_deployment.png)
144+
145+
14. For the network stack review the parameters and adjust as needed based on your specific use case or requirements
146+
15. Once you have reviewed and validated the parameters click the Next button at the bottom of the page
147+
16. Leave the default options on the "Configure stack options" page
148+
17. You will need to scroll to the bottom of this page and select the check box to allow CloudFormation to create IAM resources on your behlaf
149+
18. Click Next
150+
151+
![Choose File](images/deployment_steps/5_deployment.png)
152+
153+
19. On the "Review and create" screen review your selections one last time and then click the Submit button at the bottom of the page.
154+
155+
![Submit](images/deployment_steps/6_deployment.png)
156+
157+
20. Your CloudFormation stack will begin deploying
158+
21. You can monitor the progress of the deployment with in the CloudFormation console
159+
160+
![Choose File](images/deployment_steps/7_deployment.png)
161+
162+
22. Wait until you see the stack status update from "CREATE_IN_PROGRESS" to "CREATE_COMPLETE" before moving on to the next template
163+
23. You can review the outputs generated by the stack by going to the Outputs tab for each stack or going to the Exports page on the left-hand menu
164+
- ***Note: The export values will be used by later templates to reference resources created in earlier templates***
165+
166+
Outputs View
167+
168+
![Outputs](images/deployment_steps/8_deployment.png)
169+
170+
Exports View
171+
172+
![Exports](images/deployment_steps/9_deployment.png)
173+
174+
24. Repeat the steps above starting with step 7. moving on to the next stack in the deployment folder
175+
176+
***Important: Stacks 1-5 will have a parameter that asks for the previous stack names. If you modify the stack names from the default values, you will need to also update the parameters in each subsequent stack with the appropriate name so that the relevant services can be referenced.***
177+
178+
***Note: The storage, Slurm database, Active Directory, and AWS ParallelCluster stacks are intended to be simple examples for testing the NIST SP 800-223 reference architecture. For more production ready versions of these templates see our [HPC Recipes](https://github.com/aws-samples/aws-hpc-recipes/tree/main/recipes) repo***
179+
180+
### Default Stack Names
181+
182+
| Template File Name | Stack Name |
183+
| ----------- | ------------ |
184+
| 0_network.yaml | nist-network |
185+
| 1_security.yaml | nist-security |
186+
| 2_storage.yaml | nist-storage |
187+
| 3_slurm_db.yaml | nist-database |
188+
| 4_active_directory.yaml | nist-ad |
189+
| 5_pcluster.yaml | nist-hpc |
190+
191+
## Deployment Validation
192+
193+
* Open CloudFormation console and verify the status of the template with the name starting with each of the names above.
194+
195+
<img src="https://github.com/aws-samples/aws-hpc-recipes/tree/main/recipes/pcluster/nist-800-223/images/deployment_steps/0_validate.png" alt="Validate" width="200" height="325">
196+
197+
* Make sure that all CloudFormation stacks have a status of "CREATE_COMPLETE"
198+
199+
200+
## Next Steps
201+
202+
You now have successfully deployed the infrastructure need to comply with the guidelines and recommendations outlined in NIST SP 800-223.
203+
204+
You can begin using the cluster by logging into either the Login Node to submit a job or to the management node to review or modify any of the Slurm settings. You can use SSM to securely open a terminal session to either the login node or the head node by:
205+
206+
### Login via SSM
207+
208+
1. In the search bar above type in EC2
209+
2. In the list of services select EC2
210+
3. On the left hand menu select Instances
211+
4. Locate either the head node or the login node and select one instance by checking the box to the left of the instance
212+
5. Locate the Connect button near the top of the screen
213+
6. In the window that opens click the Session Manager tab
214+
7. Click the connect button to open a secure terminal session in your browser
215+
216+
### Login via SSH
217+
218+
Alternative, when you launch the 5_pcluster.yaml CloudFormation template you can select an SSH Key pair that exists in your AWS Account. If you completed the prerequistes steps to create a key pair you will see it populated in this list.
219+
220+
1. Locate your ssh key pair
221+
2. Ensure you have the proper permissions set on the key pair (read-only access)
222+
```chmod 400 /path/key/ssh_key.pem```
223+
3. In the list of services select EC2
224+
4. On the left hand menu select Instances
225+
5. Locate either the head node or the login node and select one instance by checking the box to the left of the instance
226+
6. Locate the Connect button near the top of the screen
227+
7. In the window that opens click the SSH client tab
228+
8. Follow the instructions on the screen to login to your instance
229+
230+
## Cleanup
231+
232+
1. In the AWS Management Console, navigate to CloudFormation and locate the 6 stacks deployed
233+
2. Starting with the most recent stack (not including any nested stacks), select the stack and click delete
234+
3. Repeat this for each of the 6 stacks deployed to remove all resources from your account

recipes/pcluster/nist_800_223/assets/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)