Google Professional Machine Learning Engineer Exam Questions

Question 6 of 339

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform's continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?

Keep the original test dataset unchanged even if newer products are incorporated into retraining.

Extend your test dataset with images of the newer products when they are introduced to retraining.

Replace your test dataset with images of the newer products when they are introduced to retraining.

Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.

Correct Answer: B

To ensure that your machine learning model remains accurate and effective for both existing and new products, you should extend your test dataset with images of the newer products when they are introduced to retraining. This allows the model to be evaluated against a representative set of both old and new products, ensuring comprehensive coverage and maintaining the relevance of the evaluation metrics.

Question 7 of 339

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

Configure AutoML Tables to perform the classification task.

Run a BigQuery ML task to perform logistic regression for the classification.

Use AI Platform Notebooks to run the classification model with pandas library.

Use AI Platform to run the classification model job configured for hyperparameter tuning.

Correct Answer: A

To build classification workflows over structured datasets stored in BigQuery without writing code for tasks like exploratory data analysis, feature selection, model building, training, hyperparameter tuning, and serving, AutoML Tables is the best option. AutoML Tables provides a user-friendly interface with automated machine learning capabilities that handle all these steps without requiring the user to write code.

Question 8 of 339

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?

Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.

Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.

Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.

Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

Correct Answer: A

Kubeflow Pipelines is an ideal choice for this scenario. It offers end-to-end orchestration capabilities, allowing you to automate the entire workflow from training to deployment, which is crucial for maintaining the relevance of predictive models with frequent retraining requirements. Kubeflow integrates well with other Google Cloud services, providing a streamlined and scalable solution following best practices.

Question 9 of 339

You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.

Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code.

Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.

Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Correct Answer: C

To minimize computation costs and manual intervention while having version control for your code, you should use Cloud Build linked with Cloud Source Repositories. This setup allows for automatic retraining triggered by new code pushes to the repository, thus meeting the requirements for automation, cost efficiency, and version control. Using Cloud Functions and sensors in Cloud Composer adds unnecessary complexity and does not provide built-in version control, while the gcloud command-line tool requires manual intervention.

Question 10 of 339

Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card']. Which loss function should you use?

Categorical hinge

Binary cross-entropy

Categorical cross-entropy

Sparse categorical cross-entropy

Correct Answer: C

For this problem, where each image can belong to one of three classes (driver's license, passport, or credit card), the appropriate loss function is categorical cross-entropy. This loss function is specifically designed for multi-class classification tasks with mutually exclusive classes and measures the dissimilarity between the predicted class probabilities and the true class labels. Using categorical cross-entropy ensures that the model is trained to produce probability distributions over the three classes, which helps in achieving accurate predictions.