Certified Data Engineer Professional Exam - Question 150

Question

All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:

key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG

There are 5 unique topics being ingested. Only the "registration" topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.

Which solution meets the requirements?

Examice · Accepted Answer

Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries. Partitioning by topic enables efficient management of access control and retention policies specifically for each topic. This way, the sensitive 'registration' topic containing PII can be managed separately to comply with the retention requirement of 14 days, while other topics can be retained indefinitely without performance degradation.

hpkr · Answer

C is correct

imatheushenrique · Answer

C.
Partitioning the data by the topic field allows the company to apply different access control policies and retention policies for different topics. Althought there is a performance optmization gain because of the read in the partition path.

Certified Data Engineer Professional Exam - Question 150

Discussion