DEA-C01 Exam QuestionsBrowse all questions from this exam

DEA-C01 Exam - Question 21


A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII.

Which solution will meet this requirement with the LEAST operational effort?

Show Answer
Correct Answer: B

To meet the requirement with the least operational effort, using the Detect PII transform in AWS Glue Studio to identify the PII and obfuscating the PII within the same framework is a highly efficient solution. AWS Glue Studio simplifies PII detection and obfuscation, which minimizes the need for extensive custom coding and operational overhead. Subsequently, using AWS Step Functions to orchestrate the data pipeline ensures a smooth and automated process for ingesting the data into the S3 data lake.

Discussion

17 comments
Sign in to comment
milofficialOption: B
Mar 18, 2024

How does Data Quality obfuscate PII? You can do this directly in Glue Studio: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html

jellybellaOption: B
Mar 18, 2024

AWS Glue Data Quality is a feature that automatically validates the quality of the data during a Glue job run, but it's not typically used for data obfuscation.

kairosfcOption: C
May 4, 2024

The transform Detect PII in AWS Glue Studio is specifically used to identify personally identifiable information (PII) within the data. It can detect and flag this information, but on its own, it does not perform the obfuscation or removal of these details. To effectively obfuscate or alter the identified PII, an additional transformation would be necessary. This could be accomplished in several ways, such as: Writing a custom script within the same AWS Glue job using Python or Scala to modify the PII data as needed. Using AWS Glue Data Quality, if available, to create rules that automatically obfuscate or modify the data identified as PII. AWS Glue Data Quality is a newer tool that helps improve data quality through rules and transformations, but whether it's needed will depend on the functionality's availability and the specificity of the obfuscation requirements

KhooksOption: B
Jun 22, 2024

Option C involves additional steps and complexity with creating rules in AWS Glue Data Quality, which adds more operational effort compared to directly using AWS Glue Studio's capabilities.

rralucard_Option: C
Feb 4, 2024

Option C seems to be the best solution to meet the requirement with the least operational effort. It leverages AWS Glue Studio for PII detection, AWS Glue Data Quality for obfuscation, and AWS Step Functions for orchestration, minimizing the need for custom coding and manual processes.

certplan
Mar 20, 2024

In python --- from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder \ .appName("Example Glue Job") \ .getOrCreate() # Initialize Glue context glueContext = GlueContext(SparkContext.getOrCreate()) # Retrieve Glue job arguments args = getResolvedOptions(sys.argv, ['JOB_NAME']) # Define your EMR step emr_step = [ { "Name": "My EMR Step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "s3://your-bucket/emr-scripts/your_script.jar", "Args": [ "arg1", "arg2" ] } } ] # Execute the EMR step response = glueContext.start_job_run(args['JOB_NAME'], job_run_args={'--extra-py-files': 'your_script.py'}) print(response)

arvehisaOption: B
Mar 30, 2024

B is correct. C: glue data quality cannot obfuscate the PII D: need to write code but the question is the "LEAST operational effort"

VerRiOption: C
May 19, 2024

We cannot directly handle PII with Glue Studio, and Glue Data Quality can be used to handle PII.

bigfoot1501
Jun 16, 2024

I don't think we need to use much more services to fulfill these requirements. Just AWS Glue is enough, it can detect and obfuscate PII data already. Source: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html#choose-action-pii

TonyStark0122
Feb 1, 2024

C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake

BartoszGolebiowski24
Feb 11, 2024

I think this is A. We ingest data to s3 with a PPI transformation. We do not need to use glue, or step function here in that case.

BartoszGolebiowski24
Feb 11, 2024

But in the other case, if this is a one-time operation, Answer: C should be better. The phrase "ingestion" case me think, that this is the stream of data. To sum up. One time: Answer C. Stream: Answer A.

GiorgioGssOption: C
Mar 7, 2024

https://dev.to/awscommunity-asean/validating-data-quality-with-aws-glue-databrew-4df4 https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html

certplan
Mar 20, 2024

B. Utilizes AWS Glue Studio for PII detection, AWS Step Functions for orchestration, and S3 for storage. Glue Studio simplifies PII detection, and Step Functions can streamline the data pipeline orchestration, potentially reducing operational effort compared to option A. C. Similar to option B, but it additionally includes AWS Glue Data Quality for obfuscating PII. This might add a bit more complexity but can also streamline the process if Glue Data Quality offers convenient features for PII obfuscation.

okechi
Apr 13, 2024

Answer is option C. Period

Just_NinjaOption: A
May 8, 2024

A very easy was is to use the SDK to identify PII. https://docs.aws.amazon.com/code-library/latest/ug/comprehend_example_comprehend_DetectPiiEntities_section.html

bakarysOption: C
Jul 1, 2024

anwser is C

qwertyuioOption: B
Jul 12, 2024

https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html