SAA-C03 Exam - Question 603

Question

A company recently migrated to the AWS Cloud. The company wants a serverless solution for large-scale parallel on-demand processing of a semistructured dataset. The data consists of logs, media files, sales transactions, and IoT sensor data that is stored in Amazon S3. The company wants the solution to process thousands of items in the dataset in parallel.

Which solution will meet these requirements with the MOST operational efficiency?

Examice · Accepted Answer

The most operationally efficient solution for large-scale parallel on-demand processing of a semistructured dataset stored in Amazon S3 is to use the AWS Step Functions Map state in Distributed mode. Distributed mode can handle a higher concurrency level compared to Inline mode, allowing up to 10,000 parallel branches, which is suitable for processing thousands of items simultaneously. This serverless solution automatically scales and manages the infrastructure, ensuring efficient processing with minimal operational overhead.

Guru4Cloud · Answer

AWS Step Functions allows you to orchestrate and scale distributed processing using the Map state. The Map state can process items in a large dataset in parallel by distributing the work across multiple resources.
Using the Map state in Distributed mode will automatically handle the parallel processing and scaling. Step Functions will add more workers to process the data as needed.
Step Functions is serverless so there are no servers to manage. It will scale up and down automatically based on demand.

Lx016 · Answer

A Map in Inline mode can support concurrency of 40 parallel branches and execution history limits of 25,000 events or approximately 6,500 state transitions in a workflow. With the Distributed mode, you can run at concurrency of up to 10,000 parallel branches. So I believe if it has to process thousands of items in parallel Distributed Mode is more appropriate

taustin2 · Answer

With Step Functions, you can orchestrate large-scale parallel workloads to perform tasks, such as on-demand processing of semi-structured data. These parallel workloads let you concurrently process large-scale data sources stored in Amazon S3. https://docs.aws.amazon.com/step-functions/latest/dg/concepts-orchestrate-large-scale-parallel-workloads.html

TariqKipkemei · Answer

The Distributed Map has been optimized for Amazon S3.,helping you more easily iterate over objects in an S3 bucket.  With the Distributed mode, you can run at concurrency of up to 10,000 parallel branches.

https://aws.amazon.com/step-functions/faqs/#:~:text=A%20Map%20in%20Inline%20mode,up%20to%2010%2C000%20parallel%20branches.

awsgeek75 · Answer

https://aws.amazon.com/blogs/aws/step-functions-distributed-map-a-serverless-solution-for-large-scale-parallel-data-processing/
https://docs.aws.amazon.com/step-functions/latest/dg/sample-dist-map-s3data-process.html

[Removed] · Answer

Large Scale + Parallel = Distributed  Step Function

https://docs.aws.amazon.com/step-functions/latest/dg/concepts-inline-vs-distributed-map.html

Sugarbear_01 · Answer

https://docs.aws.amazon.com/step-functions/latest/dg/concepts-orchestrate-large-scale-parallel-workloads.html

Lin878 · Answer

Simple - user Lambda / Complex - user Step Functions

bogdannb · Answer

Using step functions will be overwill from my point of view. I would use Glue, it’s serverless and purposely designed for such use case

Sandy1254 · Answer

https://docs.aws.amazon.com/step-functions/latest/dg/use-dist-map-orchestrate-large-scale-parallel-workloads.html

SAA-C03 Exam - Question 603

Discussion