Certified Data Engineer Associate Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Associate Exam - Question 60


A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

Show Answer
Correct Answer: C

To run a SQL query and operate with the results in PySpark, the data engineering team can use the spark.sql operation. This function allows PySpark to execute SQL queries and return the results as a DataFrame which can then be used for further processing in Python.

Discussion

3 comments
Sign in to comment
kishanuOption: C
Oct 20, 2023

spark.sql() should be used to execute a SQL query with Pyspark spark.table() can only be used to load a table and not run a query.

meow_akkOption: C
Oct 22, 2023

C is correct EG : from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.sql("SELECT * FROM sales") print(df.count())

benni_aleOption: E
Apr 29, 2024

I am not sure wheter it is C or E . I see majority went for E but you can still query your data with spark.table by using purely pyspark syntax . I don't see any part of the question specifying you HAVE to use SQL syntax.