Examice

Question 6 of 207

A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline.

Which AWS service or feature will meet these requirements MOST cost-effectively?

AWS Step Functions

AWS Glue workflows

AWS Glue Studio

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

Correct Answer: B

AWS Glue Workflows is specifically designed for Extract, Transform, Load (ETL) tasks within AWS and integrates seamlessly with data sources such as Microsoft SQL Server through built-in connectors. It is the most cost-effective choice as it provides a native and streamlined solution for orchestrating ETL jobs without the need for extra development or additional third-party tools.

Question 7 of 207

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application.

Which solution will meet these requirements with the LEAST operational overhead?

Establish WebSocket connections to Amazon Redshift.

Use the Amazon Redshift Data API.

Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.

Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.

Correct Answer: B

To run real-time queries on financial data stored in Amazon Redshift with the least operational overhead, the best solution is to use the Amazon Redshift Data API. This API allows you to execute SQL queries directly from within your trading application over HTTPS, which eliminates the need to manage and maintain complex connection setups such as WebSocket or JDBC. This significantly reduces the operational overhead and provides an efficient, scalable way to interact with your data in real-time.

Question 8 of 207

A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The company has several use cases. The company must implement permission controls to separate query processes and access to query history among users, teams, and applications that are in the same AWS account.

Which solution will meet these requirements?

Create an S3 bucket for each use case. Create an S3 bucket policy that grants permissions to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.

Create an Athena workgroup for each use case. Apply tags to the workgroup. Create an IAM policy that uses the tags to apply appropriate permissions to the workgroup.

Create an IAM role for each use case. Assign appropriate permissions to the role for each use case. Associate the role with Athena.

Create an AWS Glue Data Catalog resource policy that grants permissions to appropriate individual IAM users for each use case. Apply the resource policy to the specific tables that Athena uses.

Correct Answer: B

Creating an Athena workgroup for each use case allows for isolation and management of different workloads, users, and permissions. This approach ensures that query processes and access to query history are separated effectively for each use case. By applying tags to the workgroups and creating IAM policies based on those tags, permissions can be managed in a more organized and simplified manner. This solution meets the requirements for permission controls across users, teams, and applications within the same AWS account.

Question 9 of 207

A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.

Which solution will run the Glue jobs in the MOST cost-effective way?

Choose the FLEX execution class in the Glue job properties.

Use the Spot Instance type in Glue job properties.

Choose the STANDARD execution class in the Glue job properties.

Choose the latest version in the GlueVersion field in the Glue job properties.

Correct Answer: A

Choosing the FLEX execution class in the Glue job properties leverages spare capacity within the AWS infrastructure to run Glue jobs at a discounted price compared to the STANDARD execution class. Since the data engineer does not require the jobs to run or finish at a specific time, utilizing this class is the most cost-effective solution. FLEX execution takes advantage of idle resources, thereby reducing costs significantly.

Question 10 of 207

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.

Which solution will meet these requirements with the LEAST operational overhead?

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.

Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.

Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

Correct Answer: A

To configure an AWS Lambda function to process .csv files uploaded to an S3 bucket with the least operational overhead, you should create an S3 event notification that triggers when an object is created (i.e., s3:ObjectCreated:*). You can apply a filter rule to ensure the notification is generated only when the suffix includes .csv. This setup automatically invokes the Lambda function directly whenever a .csv file is uploaded, removing the need for additional services such as Amazon SNS, thus reducing operational complexity and overhead.