Exam Certified Data Engineer Associate All QuestionsBrowse all questions from this exam
Question 60

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

    Correct Answer: C

    To run a SQL query and operate with the results in PySpark, the data engineering team can use the spark.sql operation. This function allows PySpark to execute SQL queries and return the results as a DataFrame which can then be used for further processing in Python.

Discussion
kishanuOption: C

spark.sql() should be used to execute a SQL query with Pyspark spark.table() can only be used to load a table and not run a query.

meow_akkOption: C

C is correct EG : from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.sql("SELECT * FROM sales") print(df.count())

benni_aleOption: E

I am not sure wheter it is C or E . I see majority went for E but you can still query your data with spark.table by using purely pyspark syntax . I don't see any part of the question specifying you HAVE to use SQL syntax.