Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 93


You are performing a join operation to combine values from a static userLookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

Show Answer
Correct Answer: BE

Performing an outer join between a streaming DataFrame and a static DataFrame is invalid. This is because outer joins involve matching rows from both DataFrames, and if there are unmatched rows from the streaming DataFrame, it will not have the necessary data at all times since streaming data continuously evolves. Hence, it cannot return a complete set of results for unmatched data.

Discussion

7 comments
Sign in to comment
EnduresoulOption: B
Nov 26, 2023

Answer B is correct: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#support-matrix-for-joins-in-streaming-queries When we take a look in the supported join matrix between static and stream inputs, we can identify, that Stream-Static + outer is not supported. Answer E is wrong, because the Static-Stream + right join is supported.

lexaneonOption: B
Jan 5, 2024

believe B is correct as provided below

hal2401meOption: E
Mar 14, 2024

in my exam today, BCD are removed. i chose E, because I recall that stream-static right join are less supported.

kz_dataOption: D
Jan 12, 2024

I think the correct answer is D.

kz_data
Jan 12, 2024

Sorry I missread the question.

vctrhugoOption: B
Feb 6, 2024

Specifically, outer joins are not supported with a static DataFrame on the right and a streaming DataFrame on the left. This is because it’s not possible to guarantee all necessary rows will be available in the streaming DataFrame for every micro-batch.

Curious76Option: B
Feb 27, 2024

b is correct

imatheushenriqueOption: B
Jun 1, 2024

B. We match all the records from a static DataFrame on the left with a stream DataFrame on the right. If records do not match from the static DF (Left) to stream DF (Right), then the system cannot return null since the data changes on stream DF (Right), and we cannot guarantee if we will get matching records. That is why full_outer join is not supported.