MLS-C01 Exam QuestionsBrowse all questions from this exam

MLS-C01 Exam - Question 171


A machine learning (ML) specialist wants to create a data preparation job that uses a PySpark script with complex window aggregation operations to create data for training and testing. The ML specialist needs to evaluate the impact of the number of features and the sample count on model performance.

Which approach should the ML specialist use to determine the ideal data transformations for the model?

Show Answer
Correct Answer: D

To determine the ideal data transformations for the model, the ML specialist should run the script as a SageMaker processing job since Amazon SageMaker Experiments is specifically designed to track and analyze experiments related to ML models. SageMaker Experiments allows the specialist to capture key parameters, metrics, and artifacts from each run, helping to evaluate the impact of the number of features and the sample count on model performance. Running the script as an AWS Glue job would not provide the necessary integration with SageMaker Experiments.

Discussion

14 comments
Sign in to comment
dolorezOption: D
May 24, 2022

while I agree that Sagemaker Experiments is the way to go, it only supports Training, Processing, and Transform jobs, so the right answer is to run the job as a processing job, hence D not B https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-create.html#:~:text=CreateTrainingJob-,Processing,-Processor.run

Jerry84
Jan 16, 2023

“Generally, you use load_run with no arguments to track metrics, parameters, and artifacts within a SageMaker training or processing job script.” https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-create.html

Jerry84
Feb 21, 2023

Run PySpark script in SageMaker processing job https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html

bluer1
Apr 29, 2022

B - https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html

KlaudYu
Jun 18, 2022

But It doesn't describe glue job.

ovokpusOption: B
Jun 25, 2022

here: https://aws.amazon.com/about-aws/whats-new/2018/10/aws-glue-now-supports-connecting-amazon-sagemaker-notebooks-to-development-endpoints/#:~:text=AWS%20Glue%20now%20supports%20connecting%20Amazon%20SageMaker%20notebooks%20to%20development%20endpoints,-Posted%20On%3A%20Oct&text=You%20can%20now%20create%20an,an%20AWS%20Glue%20development%20endpoint.

aScientistOption: D
Nov 8, 2022

https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.html

blanco750Option: D
Mar 19, 2023

D looks the right answer

Mickey321Option: D
Jul 31, 2023

A PySpark script can be run as a SageMaker processing job by using the SparkProcessor class. A SageMaker processing job can use Amazon SageMaker Experiments to track the input parameters, output metrics, and artifacts of each run. A SageMaker processing job can also use Amazon SageMaker Debugger to capture tensors and analyze the training behavior, but this is more useful for deep learning models than for data preparation tasks. Running the script as an AWS Glue job would not allow the ML specialist to use Amazon SageMaker Experiments or Amazon SageMaker Debugger, as these features are specific to SageMaker.

sanjosh
Nov 9, 2023

D https://sagemaker-experiments.readthedocs.io/en/latest/tracker.html

jhonivy
Jan 25, 2023

B: Glue job goes with window aggregation operations

SANDEEP_AWSOption: B
Mar 13, 2023

https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html ---- Use SageMaker Experiments to view, manage, analyze, and compare both custom experiments that you programmatically create and experiments automatically created from SageMaker jobs.

ZSun
May 7, 2023

"SageMaker jobs" not "Glue job", it is D!

MllbOption: B
Apr 3, 2023

Key metrics is the "key". Then D is not a correct answer

ZSun
Apr 19, 2023

what is the difference between key metrics and key parameteres? why we care about key metrics, because we can compare the key metrics of different parametes and then find impact of the number of features. so the key is "glue" or "SageMaker processing"

AnonymousOption: D
May 14, 2023

The PySpark script defined above is passed via via the submit_app parameter https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing.ipynb

ADVIT
Jul 6, 2023

D: SageMaker Experiments automatically tracks the inputs, parameters, configurations, and results of your iterations as runs.

3eb0542Option: B
Apr 22, 2024

AWS Glue is a fully managed extract, transform, and load (ETL) service that is purpose-built for processing large datasets and executing PySpark scripts. It's more aligned with the task of running a PySpark script with complex window aggregation operations to prepare data for training and testing

salim1905Option: B
Jun 13, 2024

Pyspark -> AWS Glue

ef12052
Mar 23, 2025

https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#pysparkprocessor -> D