DEA-C01 Exam - Question 85

Question

An online retail company stores Application Load Balancer (ALB) access logs in an Amazon S3 bucket. The company wants to use Amazon Athena to query the logs to analyze traffic patterns.

A data engineer creates an unpartitioned table in Athena. As the amount of the data gradually increases, the response time for queries also increases. The data engineer wants to improve the query performance in Athena.

Which solution will meet these requirements with the LEAST operational effort?

Examice · Accepted Answer

To improve query performance in Amazon Athena for an increasing amount of data, it is essential to transform the data into a more efficient format and implement partitioning. Creating an AWS Lambda function to transform all ALB access logs and saving the results in Apache Parquet format addresses these requirements. Using Parquet, a columnar storage file format, significantly improves query performance and reduces scan time. Additionally, partitioning the metadata helps optimize the queries by reducing the amount of data scanned per query. Therefore, creating an AWS Lambda function for the transformation and partitioning while using Athena to query the optimized data is a solution that meets the requirements with the least operational effort.

tgv · Answer

An AWS Glue crawler can automatically determine the schema of the logs, infer partitions, and update the Glue Data Catalog. Crawlers can be scheduled to run at intervals, minimizing manual intervention.

PGGuy · Answer

Creating an AWS Glue crawler (Option B) is the most straightforward and least operationally intensive approach to automatically determine the schema, partition the data, and keep the AWS Glue Data Catalog updated. This ensures Athena queries are optimized without requiring extensive manual management or additional processing steps.

PGGuy · Answer

Creating an AWS Glue crawler (Option B) is the most straightforward and least operationally intensive approach to automatically determine the schema, partition the data, and keep the AWS Glue Data Catalog updated. This ensures Athena queries are optimized without requiring extensive manual management or additional processing steps.

andrologin · Answer

AWS Crawler with classifiers allow you to determine the schema pattern on files/data that can then be used to partition the data for Athena query optimization

DEA-C01 Exam - Question 85

Discussion