Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 17

A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB.

Which of the following likely explains these smaller file sizes?

    Correct Answer: E

    Databricks has likely autotuned to a smaller target file size based on the amount of data in each partition. Given that Auto Optimize and Auto Compaction are enabled, the system continuously manages file sizes to ensure efficient processing. The observed file sizes under 64 MB, despite each partition containing at least 1 GB of data and the overall table size exceeding 10 TB, suggest that the tuning is influenced by the per-partition data amounts to optimize the incremental updates efficiently.

Discussion
cotardo2077Option: A

https://docs.databricks.com/en/delta/tune-file-size.html#autotune-table 'Autotune file size based on workload'

EertyyOption: E

E is right answer

Eertyy

option A is correct answer as , option E is the likely explanation for the smaller file sizes

PrashantTiwariOption: A

The target file size is based on the current size of the Delta table. For tables smaller than 2.56 TB, the autotuned target file size is 256 MB. For tables with a size between 2.56 TB and 10 TB, the target size will grow linearly from 256 MB to 1 GB. For tables larger than 10 TB, the target file size is 1 GB. Correct answer is A

Jay_98_11Option: A

A is correct

imatheushenriqueOption: A

One of the purposes of a optimize execution is the gain in merge oprations, so: A. Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations

RiktRikt007Option: A

how A is correct ? While Databricks does have autotuning capabilities, it primarily considers the table size. In this case, the table is over 10 TB, which would typically lead to a target file size of 1 GB, not under 64 MB.

AziLaOption: A

correct ans is A

kz_dataOption: A

correct answer is A

BIKRAM063Option: A

Auto Optimize reduces file size less than 128MB to facilitate quick merge

sen411Option: E

E is the right answer, because the question is why there are small files

sturcuOption: A

Correct

azurearchOption: A

A is correct answer