Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 83

All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:

key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG

There are 5 unique topics being ingested. Only the "registration" topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.

Which of the following solutions meets the requirements?

    Correct Answer: E

    Data should be partitioned by the topic field because this allows for access control lists (ACLs) and delete statements to be applied effectively. By partitioning by the topic, you can isolate the 'registration' topic, which contains PII, and set retention policies to delete records after 14 days. Non-PII data can remain in other partitions indefinitely, thus meeting both the privacy and data retention requirements.

Discussion
mouad_attaqiOption: E

I think answer E is correct, as by default partitionning by a column will create a separate folder for each subset data linked to the partition

DileepvikramOption: E

I think answer is E

ervinshangOption: E

E is correct

aragorn_bregoOption: E

Partitioning data by the topic field would allow the data engineering team to apply access control lists (ACLs) to restrict access to the partition containing the "registration" topic, which holds PII. Furthermore, the team can set up automated deletion policies that specifically target the partition with PII data to delete records after 14 days, without affecting the data in other partitions. This approach meets both the privacy requirements for PII and the data retention goals for non-PII information.

ojudz08Option: D

i think it's best to isolate the storage to avoid mistakenly deleting tables in the same storage so I go with D

spaceexplorerOption: E

E is correct

[Removed]Option: B

The solution that meets the requirements is: B. Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory. Partitioning the data by the registration field allows the directory containing PII records to be isolated and access restricted via ACLs. Additionally, the data retention requirements can be met by setting up a separate job or process to remove PII records that are 14 days old. For non-PII records, they can be retained indefinitely utilizing Delta Lake's time travel functionality.

mouad_attaqi

There is no such thing as Registration field, it's a distinct topic

sturcu

you cannot restricts privileges. with ACLs on a partition. Documentations states that Securable objects in the Hive metastore are: DB, Tables, Views, Functions: https://docs.databricks.com/en/data-governance/table-acls/object-privileges.html#securable-objects

sturcuOption: D

Correct

sturcu

https://docs.databricks.com/en/data-governance/table-acls/object-privileges.html#securable-objects