Certified Data Engineer Associate Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Associate Exam - Question 97


A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.

Which keyword can be used to compact the small files?

Show Answer
Correct Answer: A

To compact small files and improve performance in a Delta table, the keyword to use is OPTIMIZE. The OPTIMIZE command is specifically designed to consolidate small files into larger ones, which reduces the overhead from handling numerous small files and hence improves query performance.

Discussion

2 comments
Sign in to comment
MDWPartnersOption: A
May 25, 2024

Repeated, correct.

kim32Option: A
Jun 18, 2024

The OPTIMIZE command is used to compact small files into larger ones, which helps improve the performance of Delta Lake tables. It consolidates small files into fewer larger files to reduce the overhead associated with having many small files. This process is often referred to as "compaction" but the specific keyword in Databricks Delta Lake is OPTIMIZE.