Snowflake SnowPro Advanced Data Engineer Exam Questions

Question 6 of 65

A large table with 200 columns contains two years of historical data. When queried, the table is filtered on a single day. Below is the Query Profile:

Exam SnowPro Advanced Data Engineer Question 6

Using a size 2XL virtual warehouse, this query took over an hour to complete.

What will improve the query performance the MOST?

Increase the size of the virtual warehouse.

Increase the number of clusters in the virtual warehouse.

Implement the search optimization service on the table.

Add a date column as a cluster key on the table.

Correct Answer: D

To improve the query performance the most, adding a date column as a cluster key on the table would be the best option. This approach helps to reduce the number of partitions scanned when filtering by specific dates because clustering organizes data based on the values in the key column, leading to more efficient querying. Increasing the size of the virtual warehouse or the number of clusters might provide some improvement in processing power but would not address the underlying issue of scanning too many partitions, as seen in the high number of partitions scanned in the Query Profile.

Question 7 of 65

A Data Engineer is working on a Snowflake deployment in AWS eu-west-1 (Ireland). The Engineer is planning to load data from staged files into target tables using the COPY INTO command.

Which sources are valid? (Choose three.)

Internal stage on GCP us-central1 (Iowa)

Internal stage on AWS eu-central-1 (Frankfurt)

External stage on GCP us-central1 (Iowa)

External stage in an Amazon S3 bucket on AWS eu-west-1 (Ireland)

External stage in an Amazon S3 bucket on AWS eu-central-1 (Frankfurt)

SSD attached to an Amazon EC2 instance on AWS eu-west-1 (Ireland)

Correct Answer: B, D, E

When working with Snowflake in AWS eu-west-1 (Ireland), an internal stage must reside in the same region as the Snowflake account. Thus, internal stages in other regions or cloud providers, such as GCP, are invalid. However, external stages are not bound to this restriction and can be located in different regions or even different cloud providers. Therefore, using internal stages in fremagne locations or cloud providers is invalid. In this context, a valid source would be an external stage on AWS or another cloud provider in various regions. Hence, the valid sources are 'Internal stage on AWS eu-central-1 (Frankfurt)', 'External stage in an Amazon S3 bucket on AWS eu-west-1 (Ireland)', and 'External stage in an Amazon S3 bucket on AWS eu-central-1 (Frankfurt)'. Option 'F' is invalid as Snowflake does not support direct data loading from an SSD attached to an Amazon EC2 instance.

Question 8 of 65

A Data Engineer wants to create a new development database (DEV) as a clone of the permanent production database (PROD). There is a requirement to disable Fail-safe for all tables.

Which command will meet these requirements?

CREATE DATABASE DEV -

CLONE PROD -

FAIL_SAFE = FALSE;

CREATE DATABASE DEV -

CLONE PROD;

CREATE TRANSIENT DATABASE DEV -

CLONE PROD;

CREATE DATABASE DEV -

CLONE PROD -

DATA_RETENTION_TIME_IN DAYS = 0;

Correct Answer: C

To disable Fail-safe for all tables, you need to create a transient database. A transient database doesn't have the Fail-safe period that is associated with permanent databases. Therefore, creating a transient database as a clone of the production database will meet the requirements to disable Fail-safe for all tables.

Question 9 of 65

Which query will show a list of the 20 most recent executions of a specified task, MYTASK, that have been scheduled within the last hour that have ended or are still running?

Exam SnowPro Advanced Data Engineer Question 9

Correct Answer: B

To retrieve the 20 most recent executions of the specified task MYTASK that have been scheduled within the last hour and have ended or are currently running, the query must include a filter to ensure that only tasks that have already started are considered. This can be done by checking that the QUERY_ID is not null, as QUERY_ID is populated only when the task starts running. Without filtering on QUERY_ID, the results could include tasks that are scheduled but have not yet started. Additionally, the query should limit the results to the most recent 20 entries.

Question 10 of 65

Which methods can be used to create a DataFrame object in Snowpark? (Choose three.)

session.jdbc_connection()

session.read.json()

session.table()

DataFrame.write()

session.builder()

session.sql()

Correct Answer: B, C, F

There are multiple methods for creating a DataFrame object in Snowpark. The method session.read.json() allows the creation of a DataFrame from a JSON file. The method session.table() is used to create a DataFrame from an existing table in Snowflake. The method session.sql() allows the creation of a DataFrame by executing a SQL query. While other methods are related to sessions or writing data, they do not directly pertain to creating a DataFrame.