Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 22

Which statement describes Delta Lake Auto Compaction?

    Correct Answer: E

    Delta Lake Auto Compaction involves checking if files can be further compacted after a write completes. If further compaction is possible, an asynchronous job runs to execute an OPTIMIZE command aiming for a default file size of 128 MB. This ensures better storage efficiency by minimizing the number of small files in a Delta table.

Discussion
aragorn_bregoOption: A

Delta Lake's Auto Compaction feature is designed to improve the efficiency of data storage by reducing the number of small files in a Delta table. After data is written to a Delta table, an asynchronous job can be triggered to evaluate the file sizes. If it determines that there are a significant number of small files, it will automatically run the OPTIMIZE command, which coalesces these small files into larger ones, typically aiming for files around 1 GB in size for optimal performance. E is incorrect because the statement is similar to A but with an incorrect default file size target.

Kill9

Table property delta.autoOptimize.autoCompact target 128 mb. For table property delta.tuneFileSizesForRewrites, tables larger than 10 TB, the target file size is 1 GB. https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

cotardo2077Option: E

E fits best, but according to docs it is synchronous opeartion "Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven’t been compacted previously."

8605246Option: E

correct answer is E, the auto-compaction runs a asynchronous job to combine small files to a default of 128 MB https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

BrianNguyen95

128 MB for partition is not compress

sturcuOption: E

E is the best feet, although databricks says that auto compaction runs runs synchronously

taif12340Option: E

Correct answer is E: Auto optimize consists of 2 complementary operations: - Optimized writes: with this feature enabled, Databricks attempts to write out 128 MB files for each table partition. - Auto compaction: this will check after an individual write, if files can further be compacted. If yes, it runs an OPTIMIZE job with 128 MB file sizes (instead of the 1 GB file size used in the standard OPTIMIZE)

ojudz08Option: E

E is the answer. Enable the settings uses the 128 MB as the target file size https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

ShaillyOption: B

A and E are wrong because auto compaction is synchronous operation! I vote for B As per documentation - "Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven’t been compacted previously." https://docs.delta.io/latest/optimizations-oss.html

imatheushenriqueOption: E

E. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB. https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-optimize-and-auto-optimize/td-p/21189

DAN_HOption: E

default file size is 128MB in auto compaction

kz_dataOption: E

E is correct as the default file size is 128MB in auto compaction, not 1GB as normal OPTIMIZE statement.

IWantCertsOption: E

128MB is the default.

Yogi05Option: E

Question is more on auto compaction hence the answer is E, as default size or auto compaction is 128 mb

hamzaKhribiOption: E

Optimize default target file size is 1Gb, however in this question we are dealing with auto compaction. Which when enabled runs optimize with 128MB file size by default.

BIKRAM063Option: E

E is correct. Auto compact tries to optimize to a file size of 128MB

EertyyOption: E

correct answer is e

BrianNguyen95Option: A

correct answer is A