DP-600 Exam QuestionsBrowse all questions from this exam

DP-600 Exam - Question 48


DRAG DROP -

You have a Fabric tenant that contains a lakehouse named Lakehouse1.

Readings from 100 IoT devices are appended to a Delta table in Lakehouse1. Each set of readings is approximately 25 KB. Approximately 10 GB of data is received daily.

All the table and SparkSession settings are set to the default.

You discover that queries are slow to execute. In addition, the lakehouse storage contains data and log files that are no longer used.

You need to remove the files that are no longer used and combine small files into larger files with a target size of 1 GB per file.

What should you do? To answer, drag the appropriate actions to the correct requirements. Each action may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Show Answer
Correct Answer:

Discussion

6 comments
Sign in to comment
SamuComqi
Feb 18, 2024

VACUUM: to remove old files no longer referenced. OPTIMIZE: to create fewer files with a larger size. Sources: * https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparksql * VACUUM: https://docs.delta.io/latest/delta-utility.html#-delta-vacuum * OPTIMIZE: https://docs.delta.io/latest/optimizations-oss.html

Momoanwar
Feb 17, 2024

Correct : OPTIMIZE Improves query performance by optimizing file sizes. See Compact data files with optimize on Delta Lake. VACUUM Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum.

Valcon_doo_NoviSad
Mar 5, 2024

I agree that it is VACUUM and OPTIMIZE, but I would say Set the optimizeWrite table setting (B) and not Run the OPTIMIZE command on a schedule (E).

thuss
Mar 12, 2024

Isn't optimizeWrite set by default though? However that would only optimize the data as it is written, not over time.

stilferx
May 9, 2024

IMHO, Vacuum & Optimize are good for optimizing Delta Lake :)

282b85d
May 28, 2024

• Remove the files that are no longer used: Run the VACUUM command on a schedule: The VACUUM command cleans up old files and log files that are no longer needed by the Delta table, helping to free up storage and potentially improve performance by reducing the number of files the query engine needs to consider. • Combine small files into larger files: Run the OPTIMIZE command on a schedule: The OPTIMIZE command compacts small files into larger ones, improving read performance by reducing the overhead associated with opening many small files. This can be particularly useful when you have a large number of small files due to frequent appends of small data sets.

Pegooli
Jul 18, 2024

answer is correct :)