A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC.
Why is the ML Specialist not seeing the instance visible in the VPC?
Correct Answer: C
Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts. This is why the Machine Learning Specialist is unable to see the instances within their own VPC. The EC2 instances and EBS volumes associated with SageMaker notebook instances are managed by AWS and are not directly visible or accessible to the customer through their VPC.
A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant.
Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test?
Correct Answer: B
To review latency, memory utilization, and CPU utilization during the load test of a SageMaker endpoint, the optimal approach is to generate an Amazon CloudWatch dashboard. Amazon CloudWatch natively supports monitoring these metrics and provides a unified view, which simplifies tracking and visualization. SageMaker automatically integrates with CloudWatch to report these metrics, facilitating real-time monitoring and management without the need for additional tools.
A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data.
Which solution requires the LEAST effort to be able to query this data?
Correct Answer: B
To query both structured and unstructured data stored in an Amazon S3 bucket with the least effort, you can use AWS Glue to catalogue the data and Amazon Athena to run queries. AWS Glue is a fully managed ETL service that automatically crawls your data, identifies the formats, and creates a catalog that can be queried. Amazon Athena is an interactive query service that lets you analyze data directly in Amazon S3 using standard SQL without the need for complex ETL jobs. This combination allows you to catalog and immediately query your data without additional infrastructure or complex setup.
A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.
Which approach allows the Specialist to use all the data to train the model?
Correct Answer: A
To train a machine learning model with a very large dataset stored in an Amazon S3 bucket without loading all of the data onto an Amazon SageMaker notebook, the best approach is to load a smaller subset of the data into the SageMaker notebook initially for verification and parameter tuning. Once the initial validation is done, a SageMaker training job can be initiated using the full dataset from the S3 bucket by utilizing Pipe input mode. Pipe input mode allows streaming of data directly from the S3 bucket to the training instances without the need to download the entire dataset, overcoming the limitations of local storage and speeding up the training process.
A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end- to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?
Correct Answer: B
To train a model using Amazon SageMaker, the most suitable approach is to push the data to Amazon S3. Amazon S3 is a highly scalable and durable object storage service, and SageMaker is designed to utilize data stored in S3 for training models. Using an AWS Data Pipeline to move data from Amazon RDS to Amazon S3 ensures a reliable and scalable data transfer. This method also allows for the data to be readily available for training, irrespective of the state of the Amazon RDS instance.