Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 289


You have data located in BigQuery that is used to generate reports for your company. You have noticed some weekly executive report fields do not correspond to format according to company standards. For example, report errors include different telephone formats and different country code identifiers. This is a frequent issue, so you need to create a recurring job to normalize the data. You want a quick solution that requires no coding. What should you do?

Show Answer
Correct Answer: A

Given the requirement for a no-coding solution, Cloud Data Fusion and Wrangler provide a visual, no-code interface specifically designed for data transformation tasks. This allows users to design data workflows and normalization tasks without writing any code. Additionally, Cloud Data Fusion supports scheduling recurring jobs, which aligns perfectly with the need to automate the normalization process on a weekly basis.

Discussion

8 comments
Sign in to comment
Matt_108Option: A
Jan 13, 2024

Definitely A, cloud data fusion and wrangler to setup the clean up pipeline with no coding required

scaenruyOption: A
Jan 4, 2024

A. Use Cloud Data Fusion and Wrangler to normalize the data, and set up a recurring job.

Sofiia98Option: A
Jan 10, 2024

Cloud Data Fusion and Wrangler

JyoGCPOption: A
Feb 21, 2024

Option A

SohiniVOption: D
Feb 25, 2024

As per chatGPT, Option D allows you to utilize BigQuery's SQL capabilities to write queries that normalize the data according to company standards. You can then schedule these queries to run on a recurring basis using BigQuery's scheduled queries feature. This feature allows you to specify a schedule (e.g., weekly) for executing SQL queries automatically. This approach requires no additional setup or coding outside of BigQuery, making it a quick and straightforward solution to address the issue of data normalization.

SohiniV
Feb 25, 2024

Any views on this ?

RenePetersen
Feb 26, 2024

Wouldn't writing the SQL transformation be considered coding? The question specifically states that a solution requiring no coding is needed.

jreale64
Mar 19, 2024

While Cloud Data Fusion with Wrangler offers a visual interface for data wrangling, it requires setting up the environment and potentially writing code for ransformations. So it its not appropriate. I think D

fitri001Option: A
Jun 17, 2024

https://cloud.google.com/data-fusion/docs

carmltekaiOption: D
Jul 16, 2024

The best solution here is D. Use BigQuery and GoogleSQL to normalize the data, and schedule recurring queries in BigQuery. Here's why: * No-code solution: BigQuery's built-in capabilities and GoogleSQL offer a no-code way to transform and standardize data. You can leverage functions like REGEXP_REPLACE to normalize phone numbers and FORMAT to ensure consistent formatting across fields. * Recurring jobs: BigQuery allows you to schedule queries to run regularly, which is perfect for maintaining data consistency over time. * Quick and efficient: BigQuery is designed for large-scale data processing, making it fast and efficient for normalization tasks.

carmltekai
Jul 16, 2024

Why other options aren't as suitable: A. Cloud Data Fusion and Wrangler: While powerful, these tools might be overkill for a simple normalization task and could involve a steeper learning curve. B. Dataflow SQL: Dataflow is primarily for stream processing and might not be the most efficient for batch transformations on data already in BigQuery. C. Dataproc Serverless: This involves using a Spark job, which requires coding and might be more complex than necessary for this task.

987af6bOption: A
Jul 21, 2024

A. Use Cloud Data Fusion and Wrangler to normalize the data, and set up a recurring job. Explanation No Coding Required: Cloud Data Fusion's Wrangler offers a no-code interface for data transformation tasks. You can visually design data normalization workflows without writing any code. Recurring Jobs: Cloud Data Fusion allows you to schedule these data normalization tasks to run on a recurring basis, meeting your need for automation.