You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform's continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?
Correct Answer: B
To ensure that your machine learning model remains accurate and effective for both existing and new products, you should extend your test dataset with images of the newer products when they are introduced to retraining. This allows the model to be evaluated against a representative set of both old and new products, ensuring comprehensive coverage and maintaining the relevance of the evaluation metrics.
You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?
Correct Answer: A
To build classification workflows over structured datasets stored in BigQuery without writing code for tasks like exploratory data analysis, feature selection, model building, training, hyperparameter tuning, and serving, AutoML Tables is the best option. AutoML Tables provides a user-friendly interface with automated machine learning capabilities that handle all these steps without requiring the user to write code.
You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?
Correct Answer: A
Kubeflow Pipelines is an ideal choice for this scenario. It offers end-to-end orchestration capabilities, allowing you to automate the entire workflow from training to deployment, which is crucial for maintaining the relevance of predictive models with frequent retraining requirements. Kubeflow integrates well with other Google Cloud services, providing a streamlined and scalable solution following best practices.
You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?
Correct Answer: C
To minimize computation costs and manual intervention while having version control for your code, you should use Cloud Build linked with Cloud Source Repositories. This setup allows for automatic retraining triggered by new code pushes to the repository, thus meeting the requirements for automation, cost efficiency, and version control. Using Cloud Functions and sensors in Cloud Composer adds unnecessary complexity and does not provide built-in version control, while the gcloud command-line tool requires manual intervention.
Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card']. Which loss function should you use?
Correct Answer: C
For this problem, where each image can belong to one of three classes (driver's license, passport, or credit card), the appropriate loss function is categorical cross-entropy. This loss function is specifically designed for multi-class classification tasks with mutually exclusive classes and measures the dissimilarity between the predicted class probabilities and the true class labels. Using categorical cross-entropy ensures that the model is trained to produce probability distributions over the three classes, which helps in achieving accurate predictions.