Certified Associate Developer for Apache Spark Exam - Question 59

Question

The code block shown below contains an error. The code block intended to read a parquet at the file path filePath into a DataFrame. Identify the error.

Code block:

spark.read.load(filePath, source – "parquet")

Examice · Accepted Answer

The load() method in PySpark's DataFrameReader class does not have a 'source' parameter. Instead, the appropriate parameter name is 'format', and its default value is 'parquet'. Therefore, the 'source' parameter should be removed, and the default format will be used.

4be8126 · Answer

The correct code block to read a parquet file would be

spark.read.parquet(filePath).

Larrave · Answer

Answer should be E. Removing source and default is 'parquet' anyway. However, it is not ideal to use load, rather the respective method.

https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameReader.load.html?highlight=dataframereader%20load#pyspark.sql.DataFrameReader.load

ZSun · Answer

1. pyspark.sql.SparkSession.read Returns a DataFrameReader
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.read.html#pyspark.sql.SparkSession.read
2. we check this DataFrameReader, it contains both "load" and "parquet" methods.
2.1. for load, load(path, format, schema)
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html#pyspark.sql.DataFrameReader.load
Therefore, the answer is A or E.
Typically parquet contains schema information. 
I do not like this question, because if reading a parquet file, directly use spark.read.parquet()

cookiemonster42 · Answer

The parameters for load() function are: path, format, schema, **options
A. Overall it makes sense, but do we really need to use schema?
B. There is load operation, that's FALSE
C. read is used without parenthesis, FALSE
D. It should indeed, but there's no source parameter, FALSE
E. That's true, but we need to put quotes for the filePath, then it's FALSE

Makes it A, but the question is really strange and not clear.

Ram459 · Answer

Intention is to read a parquet at the file path filePath into a DataFrame

Singh_Sumit · Answer

spark.read.load(PARQUET_PATH,format='parquet')

Load is valid, if provided with format.

newusername · Answer

I would go for E

juliom6 · Answer

E is correct. The "format" parameter should be used instead of "source" (default "parquet"):

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html

format: str, optional
    optional string for format of the data source. Default to ‘parquet’.

Certified Associate Developer for Apache Spark Exam - Question 59

Discussion