Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 17


A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB.

Which of the following likely explains these smaller file sizes?

Show Answer
Correct Answer: AE

Databricks has likely autotuned to a smaller target file size based on the amount of data in each partition. Given that Auto Optimize and Auto Compaction are enabled, the system continuously manages file sizes to ensure efficient processing. The observed file sizes under 64 MB, despite each partition containing at least 1 GB of data and the overall table size exceeding 10 TB, suggest that the tuning is influenced by the per-partition data amounts to optimize the incremental updates efficiently.

Discussion

12 comments
Sign in to comment
cotardo2077Option: A
Sep 5, 2023

https://docs.databricks.com/en/delta/tune-file-size.html#autotune-table 'Autotune file size based on workload'

EertyyOption: E
Aug 30, 2023

E is right answer

Eertyy
Sep 21, 2023

option A is correct answer as , option E is the likely explanation for the smaller file sizes

Jay_98_11Option: A
Jan 13, 2024

A is correct

PrashantTiwariOption: A
Feb 9, 2024

The target file size is based on the current size of the Delta table. For tables smaller than 2.56 TB, the autotuned target file size is 256 MB. For tables with a size between 2.56 TB and 10 TB, the target size will grow linearly from 256 MB to 1 GB. For tables larger than 10 TB, the target file size is 1 GB. Correct answer is A

azurearchOption: A
Sep 9, 2023

A is correct answer

sturcuOption: A
Oct 11, 2023

Correct

sen411Option: E
Oct 21, 2023

E is the right answer, because the question is why there are small files

BIKRAM063Option: A
Nov 2, 2023

Auto Optimize reduces file size less than 128MB to facilitate quick merge

kz_dataOption: A
Jan 10, 2024

correct answer is A

AziLaOption: A
Jan 21, 2024

correct ans is A

RiktRikt007Option: A
Feb 10, 2024

how A is correct ? While Databricks does have autotuning capabilities, it primarily considers the table size. In this case, the table is over 10 TB, which would typically lead to a target file size of 1 GB, not under 64 MB.

imatheushenriqueOption: A
Jun 1, 2024

One of the purposes of a optimize execution is the gain in merge oprations, so: A. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations