Exam DP-600 All QuestionsBrowse all questions from this exam
Question 48

DRAG DROP -

You have a Fabric tenant that contains a lakehouse named Lakehouse1.

Readings from 100 IoT devices are appended to a Delta table in Lakehouse1. Each set of readings is approximately 25 KB. Approximately 10 GB of data is received daily.

All the table and SparkSession settings are set to the default.

You discover that queries are slow to execute. In addition, the lakehouse storage contains data and log files that are no longer used.

You need to remove the files that are no longer used and combine small files into larger files with a target size of 1 GB per file.

What should you do? To answer, drag the appropriate actions to the correct requirements. Each action may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

    Correct Answer:

Discussion
SamuComqi

VACUUM: to remove old files no longer referenced. OPTIMIZE: to create fewer files with a larger size. Sources: * https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparksql * VACUUM: https://docs.delta.io/latest/delta-utility.html#-delta-vacuum * OPTIMIZE: https://docs.delta.io/latest/optimizations-oss.html

Momoanwar

Correct : OPTIMIZE Improves query performance by optimizing file sizes. See Compact data files with optimize on Delta Lake. VACUUM Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum.

Valcon_doo_NoviSad

I agree that it is VACUUM and OPTIMIZE, but I would say Set the optimizeWrite table setting (B) and not Run the OPTIMIZE command on a schedule (E).

thuss

Isn't optimizeWrite set by default though? However that would only optimize the data as it is written, not over time.

Pegooli

answer is correct :)

282b85d

• Remove the files that are no longer used: Run the VACUUM command on a schedule: The VACUUM command cleans up old files and log files that are no longer needed by the Delta table, helping to free up storage and potentially improve performance by reducing the number of files the query engine needs to consider. • Combine small files into larger files: Run the OPTIMIZE command on a schedule: The OPTIMIZE command compacts small files into larger ones, improving read performance by reducing the overhead associated with opening many small files. This can be particularly useful when you have a large number of small files due to frequent appends of small data sets.

stilferx

IMHO, Vacuum & Optimize are good for optimizing Delta Lake :)