AWS Certified Data Engineer - Associate DEA-C01 Exam QuestionsBrowse all questions from this exam

AWS Certified Data Engineer - Associate DEA-C01 Exam - Question 22


A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company's operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data.

The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort.

Which solution will meet these requirements with the LEAST operational overhead?

Show Answer
Correct Answer: B

AWS Step Functions tasks are ideal for orchestrating multiple AWS services, including AWS Glue and Amazon EMR. Step Functions offer a state machine-based workflow orchestration service that can sequence various tasks while handling errors and retries, thus reducing operational overhead. This service integrates directly with AWS Glue to manage ETL tasks and can initiate workloads on Amazon EMR, ensuring streamlined and automated workflows with minimal manual effort.

Discussion

22 comments
Sign in to comment
valuedateOption: B
May 22, 2024

Glue Workflow only orchestrate crawlers and glue jobs

DevoteamAnalytixOption: B
May 3, 2024

For me it's B because I did not found a possibility how Glue can trigger/orchestrate EMR processes OOTB. But with StepFunction there is a way: https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-jobs-with-aws-step-functions/

lucas_rfsbOption: A
Apr 1, 2024

Since it seems to me that this pipeline is complex, with multiple workflows, I would go for Glue workflows.

GiorgioGssOption: B
Mar 11, 2024

orchestrating = step function

[Removed]Option: B
Jan 21, 2024

Orchestrating different AWS services is a typical use case for Step Functions: https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr.html https://docs.aws.amazon.com/step-functions/latest/dg/connect-glue.html

VerRiOption: B
May 19, 2024

There is no way for Glue Workflow to trigger EMR

rralucard_Option: A
Feb 4, 2024

Option A, AWS Glue Workflows, seems to be the best solution to meet the requirements with the least operational overhead. It offers a seamless integration with the company's existing AWS Glue and Amazon EMR setup, providing a managed and straightforward way to orchestrate their ETL workflows without extensive additional setup or manual intervention.

ottarg
Mar 6, 2024

Can you provide an example of Glue initiating an EMR job? Or somewhere in the documents? AFAIK, Glue workflows are only to be used for Glue related things e.g. pull data, transform it, and store it somewhere else (ETL). Executing commands on behalf of other services can be done using boto in glue, but it feels weird using Glue like that when you have step functions which are designed for orchestrating different services.

jasango
Mar 28, 2024

Yo me voy por la D) Amazon MWAA porque Glue Workflows solo admite Jobs de Glue y Step Function puede fucionar pero no son workflows de datos. Amazon MWAA son workflows de datos y esta integrado tanto con Glue como EMR: https://aws.amazon.com/blogs/big-data/simplify-aws-glue-job-orchestration-and-monitoring-with-amazon-mwaa/

FunkyFrescoOption: B
May 26, 2024

EMR in workflows , i dont think so

TonyStark0122
Feb 2, 2024

Glue Work flows

certplan
Mar 20, 2024

Here's an example of how you can use AWS Glue to initiate an EMR (Elastic MapReduce) job: Let's assume you have an AWS Glue job that performs ETL tasks on data stored in Amazon S3. You want to leverage EMR for a specific task within this job, such as running a complex Spark job. 1. Define a Glue Job: Create an AWS Glue job using the AWS Glue console, SDK, or CLI. Define the input and output data sources, as well as the transformations you want to apply. 2. Incorporate EMR Step: Within the Glue job script, include a section where you define an EMR step. An EMR step is a unit of work that performs a specific task on an EMR cluster. Code follows in the next entry...

acoshiOption: A
Apr 29, 2024

https://aws.amazon.com/blogs/big-data/orchestrate-an-etl-pipeline-using-aws-glue-workflows-triggers-and-crawlers-with-custom-classifiers/

ttpro1995Option: B
Dec 24, 2024

We have both Glue job and EMR job, so we need Step Functions to connect those. Airflow can do it, but required more dev work.

[Removed]
Jan 21, 2024

Orchestrating different AWS services is a typical use case for Step Functions: https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr.html https://docs.aws.amazon.com/step-functions/latest/dg/connect-glue.html

HunkyBunkyOption: B
Jul 4, 2024

B - because AWS Glue can't trigger EMR

V0811Option: A
Aug 5, 2024

AWS Glue Workflows are specifically designed for orchestrating ETL jobs in AWS Glue. They allow you to define and manage complex workflows that include multiple jobs and triggers, all within the AWS Glue environment.Integration: AWS Glue workflows seamlessly integrate with other AWS Glue components, making it easier to manage ETL processes without the need for external orchestration tools.Minimal Operational Overhead: Since AWS Glue is a fully managed service, using Glue workflows will reduce the operational overhead compared to managing separate orchestrators or building custom solutions.While D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is also a good choice for more complex orchestration, it may involve more management overhead compared to the more straightforward AWS Glue workflows. Thus, AWS Glue workflows provide the least operational overhead given the context of this scenario.

ShanmahiOption: A
Aug 24, 2024

Glue workflows are managed services and best for considering least operational overhead.

Shatheesh
Oct 2, 2024

Answer A, Glue workflows

AdrifersilvaOption: A
Oct 2, 2024

glue workflows is part of the glue ecosystem so its provides seamless integration with minimal changes

plutonashOption: B
Jan 12, 2025

it is interesting to choose A for minimum effort but only step functions can trigger the work both on EMR and on GLUE jobs

PaleeOption: B
Mar 18, 2025

The company wants to improve the existing architecture so A cannot be the right choice

Rpathak4Option: A
Mar 23, 2025

Why Not the Other Options? B. AWS Step Functions More flexible but requires manual setup of states and transitions for Glue & EMR. Higher operational overhead than Glue Workflows. C. AWS Lambda Lambda is not ideal for long-running ETL workflows. Best suited for lightweight data transformations or event-driven tasks. D. Amazon MWAA (Apache Airflow) More control but requires cluster management and custom DAGs. Higher maintenance than Glue Workflows.