Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 101

Which indicators would you look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally? Assume you are using Spark’s MEMORY_ONLY storage level.

    Correct Answer: C

    When using Spark's MEMORY_ONLY storage level, the data should ideally be fully cached in memory. If the Size on Disk is greater than 0, it indicates that some data has spilled to disk, which can degrade performance because reading from disk is slower than reading from memory. Therefore, having any data on disk means the cache is not performing optimally.

Discussion
vctrhugoOption: C

C. Size on Disk is > 0 When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to disk, which can lead to degraded performance as reading from disk is slower than reading from memory.

Isio05Option: C

In this case any data on disk means that cache is not performing optimally