Certified Data Engineer Associate Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Associate Exam - Question 34


A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

Show Answer
Correct Answer: E

To solve the problem of identifying and ingesting only the new files in each pipeline run, the data engineer can use Auto Loader. Auto Loader is a Databricks feature that incrementally processes new data files as they arrive in cloud storage. It efficiently handles file detection and ingestion without requiring modifications to existing data or additional setup, making it the most suitable tool for this scenario.

Discussion

7 comments
Sign in to comment
XiltroXOption: E
Apr 2, 2023

E is the correct answer.

surrabhi_4Option: E
Apr 3, 2023

option E

AndreFROption: E
Aug 19, 2023

Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage without any additional setup. https://docs.databricks.com/en/ingestion/auto-loader/index.html

DavidRouOption: E
Oct 31, 2023

Autoloader can help if you want to ingest data incrementally.

HuroyeOption: C
Nov 15, 2023

the data engineer needs to identify which files are new since the previous run. This seems to be an analysis effort. If that is the case, and I might be wrong, then DB SQL is the correct answer.

SerGreyOption: E
Jan 8, 2024

E is correct

benni_aleOption: E
Apr 28, 2024

E is correct