Exam Certified Data Engineer Associate All QuestionsBrowse all questions from this exam
Question 34

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

    Correct Answer: E

    To solve the problem of identifying and ingesting only the new files in each pipeline run, the data engineer can use Auto Loader. Auto Loader is a Databricks feature that incrementally processes new data files as they arrive in cloud storage. It efficiently handles file detection and ingestion without requiring modifications to existing data or additional setup, making it the most suitable tool for this scenario.

Discussion
XiltroXOption: E

E is the correct answer.

surrabhi_4Option: E

option E

AndreFROption: E

Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage without any additional setup. https://docs.databricks.com/en/ingestion/auto-loader/index.html

benni_aleOption: E

E is correct

SerGreyOption: E

E is correct

HuroyeOption: C

the data engineer needs to identify which files are new since the previous run. This seems to be an analysis effort. If that is the case, and I might be wrong, then DB SQL is the correct answer.

DavidRouOption: E

Autoloader can help if you want to ingest data incrementally.