AWS Certified Machine Learning - Specialty

Here you have the best Amazon MLS-C01 practice exam questions

  • You have 332 total questions to study from
  • Each page has 5 questions, making a total of 67 pages
  • You can navigate through the pages using the buttons at the bottom
  • This questions were last updated on November 17, 2024
Question 1 of 332

A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive.

The model produces the following confusion matrix after evaluating on a test dataset of 100 customers:

Based on the model evaluation results, why is this a viable model for production?

    Correct Answer: C

    The model is evaluated based on its ability to minimize the potential loss from customer churn. The cost of churn (losing a customer) is far greater than the cost of giving an incentive (to a customer who is predicted to churn but does not actually churn). Therefore, it is vital to reduce the number of false negatives, which represent customers who are incorrectly predicted not to churn but actually do churn. In this scenario, the confusion matrix shows 10 false positives (customers incorrectly predicted to churn) and 4 false negatives (customers incorrectly predicted not to churn). This results in higher false positives, which are less costly as per the given context. Hence, the correct justification is that the cost incurred by the company as a result of false positives is less than the false negatives.

Question 2 of 332

A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users.

What should the Specialist do to meet this objective?

    Correct Answer: B

    The objective is to use user behavior and product preferences to predict product recommendations based on user similarity. Collaborative filtering recommendation systems are designed to achieve this by leveraging data on user ratings and preferences to identify similar users and make predictions based on their interactions. Content-based filtering focuses on similarities between items rather than user behavior, model-based filtering is a broader term without specificity, and combinative filtering is not a recognized method. Thus, a collaborative filtering recommendation engine is the appropriate choice for this objective.

Question 3 of 332

A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3.

The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3.

Which solution takes the LEAST effort to implement?

    Correct Answer: D

    The solution that takes the least effort to implement is to ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert the data into Parquet. Amazon Kinesis Data Firehose is a fully managed service that facilitates the automatic transformation and delivery of streaming data to destinations such as Amazon S3, which minimizes the need for additional development and management. While other options involve setting up and maintaining infrastructure as well as developing complex ETL jobs, Kinesis Data Firehose's serverless architecture and built-in capabilities make it the most straightforward and low-effort solution.

Question 4 of 332

A city wants to monitor its air quality to address the consequences of air pollution. A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city. As this is a prototype, only daily data from the last year is available.

Which model is MOST likely to provide the best results in Amazon SageMaker?

    Correct Answer: C

    To forecast air quality in parts per million of contaminants for the next 2 days with only daily data from the last year, the most suitable model will be the Amazon SageMaker Linear Learner algorithm with a predictor_type of regressor. The reason is that forecasting involves predicting continuous numeric values, which aligns well with regression models. The Linear Learner algorithm is particularly versatile and effective for regression tasks, making it suitable for this time series prediction task. Classifiers and algorithms designed for anomaly detection (such as Random Cut Forest) would not be appropriate as they are not tailored for forecasting. Despite k-Nearest-Neighbors (kNN) being used for regression in some cases, it is generally more suitable for classification tasks and may not perform as well as a dedicated regression algorithm like Linear Learner for time series data with potential linear relationships among the data points.

Question 5 of 332

A Data Engineer needs to build a model using a dataset containing customer credit card information

How can the Data Engineer ensure the data remains encrypted and the credit card information is secure?

    Correct Answer: D

    Using AWS Key Management Service (KMS) to encrypt the data ensures that the data is protected both in Amazon S3 and Amazon SageMaker. AWS KMS provides a secure and centralized key management system for managing encryption keys. Redacting credit card numbers with AWS Glue ensures that sensitive information is not included in the dataset that the model is trained on. This approach addresses both encryption needs and data privacy concerns effectively.