Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 22


Which statement describes Delta Lake Auto Compaction?

Show Answer
Correct Answer: E

Delta Lake Auto Compaction involves checking if files can be further compacted after a write completes. If further compaction is possible, an asynchronous job runs to execute an OPTIMIZE command aiming for a default file size of 128 MB. This ensures better storage efficiency by minimizing the number of small files in a Delta table.

Discussion

16 comments
Sign in to comment
8605246Option: E
Aug 6, 2023

correct answer is E, the auto-compaction runs a asynchronous job to combine small files to a default of 128 MB https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

BrianNguyen95
Aug 17, 2023

128 MB for partition is not compress

cotardo2077Option: E
Sep 5, 2023

E fits best, but according to docs it is synchronous opeartion "Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven’t been compacted previously."

aragorn_bregoOption: A
Nov 21, 2023

Delta Lake's Auto Compaction feature is designed to improve the efficiency of data storage by reducing the number of small files in a Delta table. After data is written to a Delta table, an asynchronous job can be triggered to evaluate the file sizes. If it determines that there are a significant number of small files, it will automatically run the OPTIMIZE command, which coalesces these small files into larger ones, typically aiming for files around 1 GB in size for optimal performance. E is incorrect because the statement is similar to A but with an incorrect default file size target.

Kill9
Jun 21, 2024

Table property delta.autoOptimize.autoCompact target 128 mb. For table property delta.tuneFileSizesForRewrites, tables larger than 10 TB, the target file size is 1 GB. https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

taif12340Option: E
Aug 23, 2023

Correct answer is E: Auto optimize consists of 2 complementary operations: - Optimized writes: with this feature enabled, Databricks attempts to write out 128 MB files for each table partition. - Auto compaction: this will check after an individual write, if files can further be compacted. If yes, it runs an OPTIMIZE job with 128 MB file sizes (instead of the 1 GB file size used in the standard OPTIMIZE)

sturcuOption: E
Oct 11, 2023

E is the best feet, although databricks says that auto compaction runs runs synchronously

ojudz08Option: E
Feb 14, 2024

E is the answer. Enable the settings uses the 128 MB as the target file size https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

BrianNguyen95Option: A
Aug 17, 2023

correct answer is A

EertyyOption: E
Sep 21, 2023

correct answer is e

BIKRAM063Option: E
Nov 2, 2023

E is correct. Auto compact tries to optimize to a file size of 128MB

hamzaKhribiOption: E
Dec 2, 2023

Optimize default target file size is 1Gb, however in this question we are dealing with auto compaction. Which when enabled runs optimize with 128MB file size by default.

Yogi05Option: E
Dec 26, 2023

Question is more on auto compaction hence the answer is E, as default size or auto compaction is 128 mb

IWantCertsOption: E
Jan 9, 2024

128MB is the default.

kz_dataOption: E
Jan 10, 2024

E is correct as the default file size is 128MB in auto compaction, not 1GB as normal OPTIMIZE statement.

DAN_HOption: E
Jan 31, 2024

default file size is 128MB in auto compaction

imatheushenriqueOption: E
Jun 1, 2024

E. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an OPTIMIZE job is executed toward a default of 128 MB. https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-optimize-and-auto-optimize/td-p/21189

ShaillyOption: B
Jul 21, 2024

A and E are wrong because auto compaction is synchronous operation! I vote for B As per documentation - "Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven’t been compacted previously." https://docs.delta.io/latest/optimizations-oss.html