Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 36

A Delta Lake table representing metadata about content posts from users has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

This table is partitioned by the date column. A query is run with the following filter: longitude < 20 & longitude > -20

Which statement describes how data will be filtered?

    Correct Answer: D

    Statistics in the Delta Log will be used to identify data files that might include records in the filtered range. The Delta Log keeps statistics such as min and max values for columns, which helps in identifying the relevant data files quickly without scanning all files. This file skipping mechanism improves the efficiency of the query execution by focusing only on the files that potentially contain the needed data.

Discussion
EnduresoulOption: D

D is correct. A partition can include multiple files. And the statistics are collected for each file.

sturcuOption: D

D is Correct

AziLaOption: D

Correct Ans is D

QuadronoidOption: C

I guess C option is right since transaction log contains information about max/min values of first 32 columns, it can be used in order to filter files.

Quadronoid

I reread the question and thing that I made a mistake, in option C there is information about row-level statistics, but, I guess, statistics in Delta Log it is more less about columns. So, now D looks fine for me.