Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 77

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

Which response correctly fills in the blank to meet the specified requirements?

    Correct Answer: E

    To enable near real-time workloads using Databricks Auto Loader with automatic schema detection and evolution, the function should be set up to utilize streaming writes. This requires the use of .writeStream for continuous processing, specifying the checkpoint location for fault tolerance, and setting mergeSchema to True to allow automatic schema evolution. Option E accurately reflects this setup by including these necessary configurations.

Discussion
AzureDE2522Option: E

Please refer: https://docs.databricks.com/en/ingestion/auto-loader/schema.html

FreyrOption: E

Reference: https://docs.databricks.com/en/ingestion/auto-loader/schema.html writeStream: Ensures real-time streaming write capabilities, which is essential f or near real-time workloads. checkpointLocation: Necessary for fault tolerance and tracking progress. mergeSchema: Ensures automatic schema evolution, allowing new columns to be detected and added to the target table. Why Option 'C ' is incorrect? Uses write instead of writeStream, which is for batch processing, making it inappropriate for real-time streaming. Why Option 'B ' is incorrect? Although it includes checkpointLocation and mergeSchema, the addition of trigger(once=True) is not necessary in this context, and it is better suited for batch-like processing. Reference: https://docs.databricks.com/en/ingestion/auto-loader/schema.html

vikram12aprOption: E

streamRead & StreamWrite shares the schema using checkpoint location so cloudFiles.schemaLocation needs to be same for checkpointLocation so that we dont need to specify it manually also mergeSchema True make sure if any new column detected , it will be added in the target table https://docs.databricks.com/en/ingestion/auto-loader/schema.html

hal2401meOption: E

https://notebooks.databricks.com/demos/auto-loader/01-Auto-loader-schema-evolution-Ingestion.html

mouad_attaqiOption: E

Correct answer is E, it is a streaming write, and the default outputMode is Append (so if it's optional in this case)

aragorn_bregoOption: E

This response correctly fills in the blank to meet the specified requirements of using Databricks Auto Loader for automatic schema detection and evolution in a near real-time streaming context.

DileepvikramOption: C

It does not mention to write as stream, it mentions to write incrementally, so option C looks correct for me

sturcu

there is a type in the statement. Is it schema or checkpoint ? Provided answer is not correct. It has to be a writestream, with mode append