Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 168


You work for a financial institution that lets customers register online. As new customers register, their user data is sent to Pub/Sub before being ingested into

BigQuery. For security reasons, you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary. What should you do?

Show Answer
Correct Answer: D

Before loading data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token. This approach ensures sensitive data is protected while maintaining the original format of the data, which can be crucial for specific business logics and analytics. Additionally, the cryptographic tokens can be reversed to reveal the original values to authorized personnel, fulfilling the requirement of allowing customer service representatives to access the unredacted values when necessary.

Discussion

17 comments
Sign in to comment
AWSandeepOption: B
Sep 2, 2022

B. While C and D are intriguing, they don't specify how to enable customer service representatives to receive access to the encryption token.

ffggrre
Oct 25, 2023

there is no SSN in question, it can be any ID.

MaxNRG
Dec 19, 2023

B. BigQuery column-level security: Pros: Granular control over column access, ensures only authorized users see the SSN column. Cons: Doesn't truly redact the data. The SSN values are still stored in BigQuery, even if hidden from unauthorized users. A potential security breach could expose them.

LanroOption: D
Jul 31, 2023

I don't see why we should use DLP since we know exactly the column that should be locked or encrypted. On the other hand having a cryptographic representation of SSN helps to aggregate/analyse entries. So I will vote for D, but B is much more easy to implement. Garbage question indeed.

mialllOption: D
May 3, 2023

https://cloud.google.com/dlp/docs/classification-redaction

knith66Option: D
Jul 27, 2023

the question mentions that "user data is sent to Pub/Sub before being ingested" instead of just saying data goes to big query through pub/sub. So some alteration is expected before being injected into the big query. So option D should work.

ckanaarOption: D
Sep 20, 2023

I believe the crux to the question is that the cryptographic format-preserving encryption token is re-identifiable, whereas the cryptographic hash is not: https://cloud.google.com/dlp/docs/transformations-reference Therefore, customer service can view the original values when necessary in case of D.

ckanaar
Sep 21, 2023

Nevermind, this can actually also be done in the case of answer B. They are both correct, just different implementations. No idea

spicebitsOption: D
Nov 10, 2023

Answer has to be D. Question says "you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary"... Redact... view the original values... D is the only choice.

Aman47Option: D
Dec 14, 2023

Even if you provide Column level access control, The Data Owners or other hierarchies above it will also be able to view very sensitive data. Better to just use encryption and decryption. As this data can also never be used for any analytic workloads

muhusmanOption: B
Apr 18, 2023

Answer is B, If we select C then This approach would also prevent unauthorized access to sensitive data, but it would not allow customer service representatives to view the original values when necessary.

Oleksandr0501Option: B
Apr 30, 2023

gpt: Both options B and D can be used to redact sensitive data while still allowing authorized users to view the original values when necessary. However, the choice between them would depend on specific business requirements and security considerations. Option B uses BigQuery column-level security to set table permissions for users, allowing only members of the Customer Service user group to view the SSN column. This approach is straightforward and can be implemented easily. However, it requires creating a separate user group for customer service representatives and granting them access to only the required data columns.

Oleksandr0501
Apr 30, 2023

gpt: Option D uses Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token before loading the data into BigQuery. This approach allows for more granular control over data access and can provide an added layer of security. However, it may require additional configuration and implementation effort, and it may also affect the performance of queries on the encrypted data. Google recommends using a combination of data protection techniques to safeguard sensitive data, such as encryption, data masking, and access controls. In this scenario, a possible best practice would be to use both options B and D together to provide multiple layers of protection for the sensitive data while still allowing authorized users to view the original values when necessary.

Oleksandr0501
Apr 30, 2023

i`ll take D

Oleksandr0501
May 3, 2023

now i ve read and think about better choosing A or B ... garbage question

vaga1Option: D
May 14, 2023

The answer is between B and D as well described in many comments. I personally do not see any reason to keep the information available using a token or a mask. It is not a PAN card number, it's just a personal ID. It should not be useful for analytical purposes. I'm gonna go for D then

vaga1
May 14, 2023

sorry B

ZZHZZHOption: B
Jul 10, 2023

One of the key requirement is to be able to let authorized personel see the ID. D doesn't specify that.

sr25Option: D
Jul 23, 2023

D. The question says giving CSR's access to values "when necessary" - not default access like given in B. D is a better option using the token.

kcl10Option: B
Oct 4, 2023

of course B

ffggrreOption: B
Oct 18, 2023

Customer service needs to see the original value, not possible with other options.

NircaOption: B
Nov 1, 2023

It might not be D! Since - only the Frame is kept. the data will be changed. Format Preserving Encryption (FPE), endorsed by NIST, is an advanced encryption technique that transforms data into an encrypted format while preserving its original structure. For instance, a 16-digit credit card number encrypted with FPE will still be a 16-digit number

Helinia
Dec 30, 2023

No, the value using FPE can be decrypted with key. "Encrypted values can be re-identified using the original cryptographic key and the entire output value, including surrogate annotation." https://cloud.google.com/dlp/docs/pseudonymization#supported-methods

MaxNRGOption: D
Dec 19, 2023

The best option is D - Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token. The key reasons are: DLP allows redacting sensitive PII like SSNs before loading into BigQuery. This provides security by default for the raw SSN values. Using format-preserving encryption keeps the column format intact while still encrypting, allowing business logic relying on SSN format to continue functioning. The encrypted tokens can be reversed to view original SSNs when required, meeting the access requirement for customer service reps.

MaxNRG
Dec 19, 2023

Option A does encrypt SSN but requires managing keys separately. Option B relies on complex IAM policy changes instead of encrypting by default. Option C hashes irreversibly, preventing customer service reps from viewing original SSNs when required. Therefore, using DLP format-preserving encryption before BigQuery ingestion balances both security and analytics requirements for SSN data.

MaxNRG
Dec 19, 2023

Why not B. BigQuery column-level security: Doesn't truly redact the data. The SSN values are still stored in BigQuery, even if hidden from unauthorized users. A potential security breach could expose them.

Topg4uOption: D
Jun 8, 2024

D: SSN is only tied to USA not in any other countries, The question did not mention SSN.