MLS-C01 Exam - Question 357

Question

A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data.

The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the data. The data scientist also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data.

Which solution will meet these requirements with the LEAST amount of compute resources?

Examice · Accepted Answer

MultiCloudIronMan · Answer

Why Option C?
Efficiency: Importing a subset of the data using the First K option minimizes compute resources while still providing a representative sample for exploratory data analysis (EDA).

Domain Knowledge: Leveraging domain knowledge to determine the value of K ensures that the subset is relevant and sufficient for meaningful analysis.

7f1fe73 · Answer

D. Import the data by using the Randomized option. Infer the random size from domain knowledge:

This option selects a random sample of the data.
Pros: It provides a representative sample of the entire dataset while using fewer compute resources than importing all data.
Cons: There's a small chance of missing some rare anomalies, but this risk can be mitigated by choosing an appropriate sample size based on domain knowledge.

italiancloud2025 · Answer

A: No, porque "None" importa todo el conjunto de datos, consumiendo más recursos.
B: Sí, porque la opción estratificada asegura que se incluyan casos raros en la muestra, usando menos recursos.
C: No, porque "First K" puede sesgar la muestra y omitir anomalías si no están en las primeras K muestras.
D: No, porque el muestreo aleatorio puede omitir las anomalías raras y depender de un tamaño de muestra arbitrario.

ef12052 · Answer

https://aws.amazon.com/it/about-aws/whats-new/2022/04/amazon-sagemaker-data-wrangler-supports-random-sampling-stratified-sampling/

Carpediem78 · Answer

D. Import the data by using the Randomized option. Infer the random size from domain knowledge.

MLS-C01 Exam - Question 357

Discussion