Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 143

A Delta Lake table representing metadata about content posts from users has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

This table is partitioned by the date column. A query is run with the following filter:

longitude < 20 & longitude > -20

Which statement describes how data will be filtered?

    Correct Answer: D

    When a Delta Lake table is partitioned by a column different from the filtered column (in this case, 'date' for partitioning and 'longitude' for filtering), Delta Lake uses statistics stored in the Delta Log to identify data files that might include records within the specified filter range. This allows for efficient data skipping during query execution. Therefore, the correct statement is that statistics in the Delta Log will be used to identify data files that might include records in the filtered range.

Discussion
vexor3Option: D

D is correct