Exam Certified Data Engineer Associate All QuestionsBrowse all questions from this exam
Question 12

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

    Correct Answer:

    To access a Delta table in PySpark, the data engineering team can use the spark.table("sales") command. This function allows access to a registered table within the SparkSession, enabling the team to perform various operations on the Delta table. Using spark.table is a standard and commonly used method for accessing tables in PySpark.

Discussion
AtnafuOption: E

E The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ("sales"), the data engineering team can read the Delta table and perform various operations on it using PySpark. Option A, SELECT * FROM sales, is a SQL syntax and cannot be directly used in PySpark. Option B, "There is no way to share data between PySpark and SQL," is incorrect. PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs. Option C, spark.sql("sales"), is a valid command to execute SQL queries on registered tables in PySpark. However, in this case, the "sales" argument alone is not a valid SQL query. Option D, spark.delta.table("sales"), is a specific method provided by Delta Lake to access Delta tables directly. While it can be used to access the "sales" table, it is not the most common approach in PySpark.

GarynOption: E

E. spark.table("sales") The spark.table() function in PySpark allows access to a registered table within the SparkSession. In this case, "sales" is the name of the Delta table created by the data analyst, and the spark.table() function enables access to this table for performing data engineering tests using Python (PySpark).

benni_aleOption: E

e is correct

ThomasRepsOption: E

It's E. As stated by others, the default format is delta If you try to run D, you get an error, that there are no "delta"-command for spark: "AttributeError: 'SparkSession' object has no attribute 'delta'". If you want to explicit tell it should be delta, then you need an ".option(format='delta')" insted.

TickxitOption: E

E: spark.table or spark.read.table

souldivOption: E

spark.table() . E is the correct one

benni_aleOption: E

E is correct

ItmmaOption: E

E is correct

SerGreyOption: E

Correct answer is E

csdOption: C

C is correct Answer

awofalusOption: E

Correct is E

KalavathiPOption: E

E is correct

d_b47

delta is default.

DwarakkrishnaOption: E

You access data in Delta tables by the table name or the table path, as shown in the following examples: people_df = spark.read.table(table_name) display(people_df)

prasiosoOption: E

I believe the answer is E as in databricks the default tables are delta tables hence spark.table should be enough. Have not seen a spark.delta.table function before.

softthinkers

Correct Answer is D spark.delta.table("sales") And the reason that its asking for delta table not normal table if its for normal table then it should be spark.table("sales")

qium

default type type is "delta".

Majjjj

The correct answer is D. The data engineering team can access the Delta table sales in PySpark by using the spark.delta.table command. This command is used to create a DataFrame based on a Delta table. Therefore, the correct command is spark.delta.table("sales").