Certified Data Engineer Associate Exam - Question 12

Question

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

Examice · Accepted Answer

To access a Delta table in PySpark, the data engineering team can use the spark.table("sales") command. This function allows access to a registered table within the SparkSession, enabling the team to perform various operations on the Delta table. Using spark.table is a standard and commonly used method for accessing tables in PySpark.

Atnafu · Answer

E
The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ("sales"), the data engineering team can read the Delta table and perform various operations on it using PySpark.

Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark.

Option B, "There is no way to share data between PySpark and SQL," is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs.

Option C, spark.sql("sales"), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the "sales" argument alone is not a valid SQL query.

Option D, spark.delta.table("sales"), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the "sales" table, it is not the most common approach in PySpark.

Garyn · Answer

E. spark.table("sales")

The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, "sales" is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).

Tickxit · Answer

E: spark.table or spark.read.table

ThomasReps · Answer

It's E. As stated by others, the default format is delta

If you try to run D, you get an error, that there are no "delta"-command for spark: "AttributeError: 'SparkSession' object has no attribute 'delta'". If you want to explicit tell it should be delta, then you need an ".option(format='delta')" insted.

benni_ale · Answer

e is correct

Majjjj · Answer

The correct answer is D.

The data engineering team can access the Delta table sales in PySpark by using the spark.delta.table command. This command is used to create a DataFrame based on a Delta table. Therefore, the correct command is spark.delta.table("sales").

softthinkers · Answer

Correct Answer is D spark.delta.table("sales") And the reason that its asking for delta table not normal table if its for normal table then it should be  spark.table("sales")

prasioso · Answer

I believe the answer is E as in databricks the default tables are delta tables hence spark.table should be enough. Have not seen a spark.delta.table function before.

Dwarakkrishna · Answer

You access data in Delta tables by the table name or the table path, as shown in the following examples:
people_df = spark.read.table(table_name)

display(people_df)

d_b47 · Answer

delta is default.

KalavathiP · Answer

E is correct

awofalus · Answer

Correct is E

csd · Answer

C is correct Answer

SerGrey · Answer

Correct answer is E

Itmma · Answer

E is correct

benni_ale · Answer

E is correct

souldiv · Answer

spark.table() . E is the correct one

Certified Data Engineer Associate Exam - Question 12

Discussion