Certified Data Engineer Associate Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Associate Exam - Question 12


A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

Show Answer
Correct Answer:

To access a Delta table in PySpark, the data engineering team can use the spark.table("sales") command. This function allows access to a registered table within the SparkSession, enabling the team to perform various operations on the Delta table. Using spark.table is a standard and commonly used method for accessing tables in PySpark.

Discussion

17 comments
Sign in to comment
AtnafuOption: E
Jul 9, 2023

E The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ("sales"), the data engineering team can read the Delta table and perform various operations on it using PySpark. Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark. Option B, "There is no way to share data between PySpark and SQL," is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs. Option C, spark.sql("sales"), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the "sales" argument alone is not a valid SQL query. Option D, spark.delta.table("sales"), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the "sales" table, it is not the most common approach in PySpark.

GarynOption: E
Dec 29, 2023

E. spark.table("sales") The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, "sales" is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).

TickxitOption: E
May 9, 2023

E: spark.table or spark.read.table

ThomasRepsOption: E
Jun 12, 2023

It's E. As stated by others, the default format is delta If you try to run D, you get an error, that there are no "delta"-command for spark: "AttributeError: 'SparkSession' object has no attribute 'delta'". If you want to explicit tell it should be delta, then you need an ".option(format='delta')" insted.

benni_aleOption: E
Apr 4, 2024

e is correct

Majjjj
May 4, 2023

The correct answer is D. The data engineering team can access the Delta table sales in PySpark by using the spark.delta.table command. This command is used to create a DataFrame based on a Delta table. Therefore, the correct command is spark.delta.table("sales").

softthinkers
May 4, 2023

Correct Answer is D spark.delta.table("sales") And the reason that its asking for delta table not normal table if its for normal table then it should be spark.table("sales")

qium
Nov 8, 2023

default type type is "delta".

prasiosoOption: E
May 12, 2023

I believe the answer is E as in databricks the default tables are delta tables hence spark.table should be enough. Have not seen a spark.delta.table function before.

DwarakkrishnaOption: E
Jun 4, 2023

You access data in Delta tables by the table name or the table path, as shown in the following examples: people_df = spark.read.table(table_name) display(people_df)

d_b47
Sep 25, 2023

delta is default.

KalavathiPOption: E
Sep 26, 2023

E is correct

awofalusOption: E
Nov 7, 2023

Correct is E

csdOption: C
Dec 26, 2023

C is correct Answer

SerGreyOption: E
Jan 3, 2024

Correct answer is E

ItmmaOption: E
Mar 19, 2024

E is correct

benni_aleOption: E
Apr 27, 2024

E is correct

souldivOption: E
Jul 21, 2024

spark.table() . E is the correct one