Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 158


The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the Spark UI's Storage tab to signal that a cached table is not performing optimally?

Show Answer
Correct Answer: BC

If the Spark MEMORY_ONLY storage level is being used, any data that is spilled to disk indicates that there is insufficient memory to store all the data in memory, which directly contradicts the purpose of using MEMORY_ONLY. Therefore, if the Size on Disk is greater than 0, it signals that the cached table is not performing optimally.

Discussion

5 comments
Sign in to comment
MDWPartnersOption: C
May 29, 2024

I would say C

imatheushenriqueOption: B
Jun 1, 2024

B. This annotation says that some partitions of the cached data have been spilled to disk because there wasn't enough memory to keep them.

FreyrOption: B
Jun 1, 2024

Correct Answer: B Option B, is the most correct and relevant choice for an indicator that a cached table is not performing optimally in a MEMORY_ONLY scenario. If an RDD block includes a "?" annotation, it strongly suggests issues with caching, which would directly impact the performance and expected behavior of MEMORY_ONLY caching. This indication points to a failure to cache the data entirely in memory, which is what MEMORY_ONLY intends to do. Option C, could also be a relevant indicator in general caching scenarios (e.g., MEMORY_AND_DISK), but it contradicts the MEMORY_ONLY setting directly. Therefore, Option B is chosen based on the specific storage level described.

Freyr
Jun 10, 2024

*THE CORRECT ANSWER IS: C* PLEASE IGNORE MY PREVIOUS ANSWER. Long story short, B is correct in the context of non-functional requirement, but the question is based in functional requirement, and sorry for the confusion.

hpkrOption: C
Jun 12, 2024

C is correct here

03355a2Option: C
Jun 27, 2024

It's simple, if MEMORY_ONLY is used, anything spilled to disk would indicate a problem.

03355a2
Jun 27, 2024

The RDD answer is incorrect for this question due to the fact that while this indicates a failure to cache, it is more specific to identifying individual blocks that failed to cache rather than providing a general signal of a suboptimal performance for the entire cached table.