Certified Data Engineer Associate Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Associate Exam - Question 70


A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Show Answer
Correct Answer: B

In Structured Streaming, to process all available data in as many batches as required, the data engineer should use the trigger method with availableNow set to True. The code trigger(availableNow=True) will process all available data in the source table at the start and terminate after processing it in multiple batches. This is useful when there is a need to process existing data without waiting for new data to arrive.

Discussion

7 comments
Sign in to comment
meow_akkOption: B
Oct 22, 2023

sorry Ans is B : https://stackoverflow.com/questions/71061809/trigger-availablenow-for-delta-source-streaming-queries-in-pyspark-databricks for batch we use available now

kbaba101Option: B
Oct 24, 2023

B availableNowbool, optional if set to True, set a trigger that processes all available data in multiple batches then terminates the query. Only one trigger can be set.

55f31c8Option: B
Nov 29, 2023

https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss/api/pyspark.sql.streaming.DataStreamWriter.trigger.html

fifirifiOption: B
Mar 10, 2024

correct answer: B explanation: In Structured Streaming, if a data engineer wants to process all the available data in as many batches as required without any explicit trigger interval, they can use the option trigger(availableNow=True). This feature, availableNow, is used to specify that the query should process all the data that is available at the moment and not wait for more data to arrive.

meow_akkOption: D
Oct 22, 2023

Correct Ans is D : %python spark.readStream.format("delta").load("<delta_table_path>") .writeStream .format("delta") .trigger(processingTime='5 seconds') #Added line of code that defines .trigger processing time. .outputMode("append") .option("checkpointLocation","<checkpoint_path>") .options(**writeConfig) .start() https://kb.databricks.com/streaming/optimize-streaming-transactions-with-trigger

AndreFROption: B
Dec 20, 2023

it’s the only answer with a correct syntax

benni_aleOption: B
Apr 29, 2024

b is ok