Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 59

The code block shown below contains an error. The code block intended to read a parquet at the file path filePath into a DataFrame. Identify the error.

Code block:

spark.read.load(filePath, source – "parquet")

    Correct Answer: E

    The load() method in PySpark's DataFrameReader class does not have a 'source' parameter. Instead, the appropriate parameter name is 'format', and its default value is 'parquet'. Therefore, the 'source' parameter should be removed, and the default format will be used.

Discussion
4be8126Option: B

The correct code block to read a parquet file would be spark.read.parquet(filePath).

LarraveOption: E

Answer should be E. Removing source and default is 'parquet' anyway. However, it is not ideal to use load, rather the respective method. https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameReader.load.html?highlight=dataframereader%20load#pyspark.sql.DataFrameReader.load

Ram459Option: E

Intention is to read a parquet at the file path filePath into a DataFrame

cookiemonster42Option: A

The parameters for load() function are: path, format, schema, **options A. Overall it makes sense, but do we really need to use schema? B. There is load operation, that's FALSE C. read is used without parenthesis, FALSE D. It should indeed, but there's no source parameter, FALSE E. That's true, but we need to put quotes for the filePath, then it's FALSE Makes it A, but the question is really strange and not clear.

cookiemonster42

UPD - parquet already has schema in it, it's not needed, then, I don't know what the answer is then

ZSun

1. pyspark.sql.SparkSession.read Returns a DataFrameReader https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.read.html#pyspark.sql.SparkSession.read 2. we check this DataFrameReader, it contains both "load" and "parquet" methods. 2.1. for load, load(path, format, schema) https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html#pyspark.sql.DataFrameReader.load Therefore, the answer is A or E. Typically parquet contains schema information. I do not like this question, because if reading a parquet file, directly use spark.read.parquet()

juliom6Option: E

E is correct. The "format" parameter should be used instead of "source" (default "parquet"): https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html format: str, optional optional string for format of the data source. Default to ‘parquet’.

newusernameOption: E

I would go for E

Singh_SumitOption: B

spark.read.load(PARQUET_PATH,format='parquet') Load is valid, if provided with format.