Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 10


A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT

To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3

Which statement describes how the Delta engine identifies which files to load?

Show Answer
Correct Answer: BD

The Delta engine identifies which files to load by scanning the Delta log for min and max statistics for the latitude column. Delta Lake captures statistics for each data file, including minimum and maximum values in each column, and uses these statistics to optimize query execution by determining which files may contain the data that matches the query filter.

Discussion

17 comments
Sign in to comment
taif12340Option: D
Aug 23, 2023

Answer D: In the Transaction log, Delta Lake captures statistics for each data file of the table. These statistics indicate per file: - Total number of records - Minimum value in each column of the first 32 columns of the table - Maximum value in each column of the first 32 columns of the table - Null value counts for in each column of the first 32 columns of the table When a query with a selective filter is executed against the table, the query optimizer uses these statistics to generate the query result. it leverages them to identify data files that may contain records matching the conditional filter. For the SELECT query in the question, The transaction log is scanned for min and max statistics for the price column

RiktRikt007Option: D
Feb 10, 2024

I checked the delta log, and it dose store stat, stats":"{\"numRecords\":1,\"minValues\":{\"id\":1,\"name\":\"one\",\"age\":11},\"maxValues\":{\"id\":1,\"name\":\"one\",\"age\":11},\"nullCount\":{\"id\":0,\"name\":0,\"age\":0}}"

lexaneonOption: D
Jan 1, 2024

D https://www.databricks.com/discover/pages/optimize-data-workloads-guide#:~:text=Delta%20data%20skipping%20automatically%20collects,to%20speed%20up%20the%20queries.

ranithOption: D
Jan 7, 2024

_delta_log contains the max and min of each column for the first 30 odd columns in a table for each partition. Also there is nothing called parquet file footers. Correct answer is D.

kz_dataOption: D
Jan 10, 2024

I think the correct answer is D

Jay_98_11Option: D
Jan 13, 2024

D for sure

AziLaOption: D
Jan 21, 2024

correct ans is D

kkravetsOption: D
Feb 16, 2024

D is correct one

Curious76Option: D
Feb 27, 2024

D is the answer

vikram12aprOption: D
Feb 29, 2024

D is the right answer

DavidRouOption: D
Mar 10, 2024

Statistics on first 32 columns of a table are computed and written in the Delta Log by default.

alexvnoOption: D
Mar 13, 2024

Delta log first

arik90Option: D
Mar 26, 2024

Based on Docu is D I don't know why here is showing B

TayariOption: D
Apr 30, 2024

D is the answer

coercionOption: D
May 19, 2024

Delta log collects statistics like min value, max value, no of records, no of files for each transaction that happens on the table for the first 32 columns (default value)

imatheushenriqueOption: D
Jun 5, 2024

D. The Delta log is scanned for min and max statistics for the latitude column

03355a2Option: D
Jun 26, 2024

No explanation needed, this is where the information is stored.