MLS-C01 Exam - Question 171

Question

A machine learning (ML) specialist wants to create a data preparation job that uses a PySpark script with complex window aggregation operations to create data for training and testing. The ML specialist needs to evaluate the impact of the number of features and the sample count on model performance.

Which approach should the ML specialist use to determine the ideal data transformations for the model?

Examice · Accepted Answer

To determine the ideal data transformations for the model, the ML specialist should run the script as a SageMaker processing job since Amazon SageMaker Experiments is specifically designed to track and analyze experiments related to ML models. SageMaker Experiments allows the specialist to capture key parameters, metrics, and artifacts from each run, helping to evaluate the impact of the number of features and the sample count on model performance. Running the script as an AWS Glue job would not provide the necessary integration with SageMaker Experiments.

dolorez · Answer

while I agree that Sagemaker Experiments is the way to go, it only supports Training, Processing, and Transform jobs, so the right answer is to run the job as a processing job, hence D not B

https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-create.html#:~:text=CreateTrainingJob-,Processing,-Processor.run

bluer1 · Answer

B - https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html

ovokpus · Answer

here:

https://aws.amazon.com/about-aws/whats-new/2018/10/aws-glue-now-supports-connecting-amazon-sagemaker-notebooks-to-development-endpoints/#:~:text=AWS%20Glue%20now%20supports%20connecting%20Amazon%20SageMaker%20notebooks%20to%20development%20endpoints,-Posted%20On%3A%20Oct&text=You%20can%20now%20create%20an,an%20AWS%20Glue%20development%20endpoint.

aScientist · Answer

https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.html

blanco750 · Answer

D looks the right answer

Mickey321 · Answer

A PySpark script can be run as a SageMaker processing job by using the SparkProcessor class.
A SageMaker processing job can use Amazon SageMaker Experiments to track the input parameters, output metrics, and artifacts of each run.
A SageMaker processing job can also use Amazon SageMaker Debugger to capture tensors and analyze the training behavior, but this is more useful for deep learning models than for data preparation tasks.
Running the script as an AWS Glue job would not allow the ML specialist to use Amazon SageMaker Experiments or Amazon SageMaker Debugger, as these features are specific to SageMaker.

sanjosh · Answer

D https://sagemaker-experiments.readthedocs.io/en/latest/tracker.html

jhonivy · Answer

B:  Glue job goes with window aggregation operations

SANDEEP_AWS · Answer

https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html    ---- Use SageMaker Experiments to view, manage, analyze, and compare both custom experiments that you programmatically create and experiments automatically created from SageMaker jobs.

Mllb · Answer

Key metrics is the "key". Then D is not a correct answer

Anonymous · Answer

The PySpark script defined above is passed via via the submit_app parameter
https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.ipynb

ADVIT · Answer

D: SageMaker Experiments automatically tracks the inputs, parameters, configurations, and results of your iterations as runs.

3eb0542 · Answer

AWS Glue is a fully managed extract, transform, and load (ETL) service that is purpose-built for processing large datasets and executing PySpark scripts. It's more aligned with the task of running a PySpark script with complex window aggregation operations to prepare data for training and testing

salim1905 · Answer

Pyspark -> AWS Glue

MLS-C01 Exam - Question 171

Discussion