Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 158

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the Spark UI's Storage tab to signal that a cached table is not performing optimally?

    Correct Answer: C

    If the Spark MEMORY_ONLY storage level is being used, any data that is spilled to disk indicates that there is insufficient memory to store all the data in memory, which directly contradicts the purpose of using MEMORY_ONLY. Therefore, if the Size on Disk is greater than 0, it signals that the cached table is not performing optimally.

Discussion
MDWPartnersOption: C

I would say C

03355a2Option: C

It's simple, if MEMORY_ONLY is used, anything spilled to disk would indicate a problem.

03355a2

The RDD answer is incorrect for this question due to the fact that while this indicates a failure to cache, it is more specific to identifying individual blocks that failed to cache rather than providing a general signal of a suboptimal performance for the entire cached table.

hpkrOption: C

C is correct here

FreyrOption: B

Correct Answer: B Option B, is the most correct and relevant choice for an indicator that a cached table is not performing optimally in a MEMORY_ONLY scenario. If an RDD block includes a "?" annotation, it strongly suggests issues with caching, which would directly impact the performance and expected behavior of MEMORY_ONLY caching. This indication points to a failure to cache the data entirely in memory, which is what MEMORY_ONLY intends to do. Option C, could also be a relevant indicator in general caching scenarios (e.g., MEMORY_AND_DISK), but it contradicts the MEMORY_ONLY setting directly. Therefore, Option B is chosen based on the specific storage level described.

Freyr

*THE CORRECT ANSWER IS: C* PLEASE IGNORE MY PREVIOUS ANSWER. Long story short, B is correct in the context of non-functional requirement, but the question is based in functional requirement, and sorry for the confusion.

imatheushenriqueOption: B

B. This annotation says that some partitions of the cached data have been spilled to disk because there wasn't enough memory to keep them.