Professional Machine Learning Engineer

Here you have the best Google Professional Machine Learning Engineer practice exam questions

You have 285 total questions to study from
Each page has 5 questions, making a total of 57 pages
You can navigate through the pages using the buttons at the bottom
This questions were last updated on March 31, 2025

Question 1 of 285

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?

1 = Dataflow, 2 = AI Platform, 3 = BigQuery

1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable

1 = BigQuery, 2 = AutoML, 3 = Cloud Functions

1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage

Correct Answer: A

To handle real-time streaming data and apply machine learning models for anomaly detection, the ideal configuration involves using Dataflow for data processing. Dataflow is a fully managed service for executing Apache Beam pipelines that handle stream and batch data processing. Using AI Platform allows for deploying and managing machine learning models. BigQuery is a powerful analytics data warehouse that can store the results for further analysis and visualization. Therefore, the correct configuration is 1 = Dataflow, 2 = AI Platform, 3 = BigQuery.

Question 2 of 285

Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires users to confirm their presence and shuttle station one day in advance. What approach should you take?

1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the prediction.

1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch an available shuttle and provide the map with the required stops based on the prediction.

1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.

1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.

Correct Answer: C

To make the internal shuttle service route more efficient, an optimization approach should be used rather than a prediction model. Since users are required to confirm their presence and shuttle station one day in advance, the presence of passengers is a known factor. The optimal route would be the shortest route that passes by all shuttle stations with confirmed attendance at the given time while considering capacity constraints. This ensures that the shuttle service is both time-efficient and resource-efficient.

Question 3 of 285

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

Use the class distribution to generate 10% positive examples.

Use a convolutional neural network with max pooling and softmax activation.

Downsample the data with upweighting to create a sample with 10% positive examples.

Remove negative examples until the numbers of positive and negative examples are equal.

Correct Answer: C

When dealing with a class imbalance problem in machine learning where less than 1% of the readings are positive examples, one effective strategy is to downsample the data with upweighting. This involves reducing the number of examples from the majority class (negative examples) and giving more weight to the minority class (positive examples) during training. This helps to create a more balanced sample that better represents the underlying data distribution and allows the model to focus more on the positive examples, thus improving its performance in identifying failure incidents. Techniques like using a convolutional neural network are not specifically addressing the class imbalance issue directly, hence resolving the imbalance through downsampling and upweighting is more appropriate in this context.

Question 4 of 285

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.

Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.

Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.

Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.

Correct Answer: D

To optimize the pipeline on Google Cloud using a serverless tool with SQL syntax, you should ingest the data into BigQuery. BigQuery is serverless and supports SQL queries, which allows you to transform the data efficiently and at scale. After performing the transformations using BigQuery SQL queries, you can write the results to a new table. This approach meets both the speed and processing requirements.

Question 5 of 285

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, Scikit-learn, and custom libraries. What should you do?

Use the AI Platform custom containers feature to receive training jobs using any framework.

Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.

Create a library of VM images on Compute Engine, and publish these images on a centralized repository.

Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Correct Answer: A

AI Platform custom containers feature allows teams to use a variety of frameworks including Keras, PyTorch, Theano, Scikit-learn, and custom libraries. This managed service provides the flexibility and scalability needed to handle different frameworks in a cloud-based backend system, making it easier to administer training jobs.