Professional Data Engineer on Google Cloud Platform

Here you have the best Google Professional Data Engineer practice exam questions

  • You have 349 total questions across 70 pages (5 per page)
  • These questions were last updated on March 13, 2026
  • This site is not affiliated with or endorsed by Google.
Question 1 of 349

Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?
Answer

Suggested Answer

The suggested answer is C.

The poor performance of the model on new data despite fitting well on the training data indicates overfitting. Overfitting occurs when a model learns the details and noise in the training data to an extent that it negatively impacts the model's performance on new data. Dropout Methods are a regularization technique used to prevent overfitting in neural networks. By randomly dropping neurons during training, dropout helps to ensure that the model does not rely too heavily on any individual neurons, thus promoting generalization and improving the model's performance on new data.

Community Votes25 votes
CSuggested
100%
Question 2 of 349

You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?
Answer

Suggested Answer

The suggested answer is B.

To maintain accuracy and relevance in a clothing recommendation model, it is crucial to continuously retrain the model using both existing data and new data. This approach leverages the historical data to provide context and stability while incorporating the latest trends to keep the model up-to-date. Simply retraining on new data might make the model overly reactive to recent trends and lose the broader perspective provided by historical data. Conversely, using new or old data exclusively for testing is not effective for continuous learning and adaptability. Therefore, integrating both data sources ensures the model remains balanced and effective in reflecting changing user preferences.

Community Votes25 votes
BSuggested
96%
C
4%
Question 3 of 349

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?
Answer

Suggested Answer

The suggested answer is C.

Since the database must store significantly more patient records, it is important to improve the efficiency and scalability of the design. Normalizing the master patient-record table into separate tables for patients and visits will reduce data redundancy and improve query performance. This approach will help the database handle the increased data volume and allow for more efficient report generation by avoiding the performance issues associated with self-joins.

Community Votes18 votes
CSuggested
100%
Question 4 of 349

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?
Answer

Suggested Answer

The suggested answer is A.

In Google Data Studio 360, the mechanism that causes visualizations to not show recent data (less than 1 hour old) is caching. By default, Data Studio uses caching to enhance performance by reducing the number of queries sent to the data source. To ensure that the data displayed in your report is always the most up-to-date, you should disable caching in the report settings. This forces Data Studio to retrieve the latest data directly from the data source (in this case, Google BigQuery) each time the report is viewed. Disabling caching can impact performance, but it ensures data accuracy. Therefore, the correct action is to disable caching by editing the report settings.

Community Votes18 votes
ASuggested
78%
C
17%
D
6%
Question 5 of 349

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values
(CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?
Answer

Suggested Answer

The suggested answer is D.

Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery and push errors to another dead-letter table for analysis. Dataflow allows you to preprocess the data, making it possible to handle corrupted or incorrectly formatted rows effectively. By pushing problematic rows to a dead-letter table, you ensure only clean and correctly formatted data is loaded into BigQuery for accurate analysis while also retaining the problematic data for further inspection and resolution.

Community Votes17 votes
DSuggested
100%

About the Google Professional Data Engineer Certification Exam

About the Exam

The Google Professional Data Engineer (Professional Data Engineer on Google Cloud Platform) validates your knowledge and skills. Passing demonstrates proficiency and can boost your career prospects in the field.

How to Prepare

Work through all 349 practice questions across 70 pages. Focus on understanding the reasoning behind each answer rather than memorizing responses to be ready for any variation on the real exam.

Why Practice Exams?

Practice exams help you familiarize yourself with the question format, manage your time, and reduce anxiety on the test day. Our Professional Data Engineer questions are regularly updated to reflect the latest exam objectives.