Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 59


The code block shown below contains an error. The code block intended to read a parquet at the file path filePath into a DataFrame. Identify the error.

Code block:

spark.read.load(filePath, source – "parquet")

Show Answer
Correct Answer: BE

The load() method in PySpark's DataFrameReader class does not have a 'source' parameter. Instead, the appropriate parameter name is 'format', and its default value is 'parquet'. Therefore, the 'source' parameter should be removed, and the default format will be used.

Discussion

8 comments
Sign in to comment
4be8126Option: B
May 3, 2023

The correct code block to read a parquet file would be spark.read.parquet(filePath).

LarraveOption: E
Jun 22, 2023

Answer should be E. Removing source and default is 'parquet' anyway. However, it is not ideal to use load, rather the respective method. https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameReader.load.html?highlight=dataframereader%20load#pyspark.sql.DataFrameReader.load

ZSun
Jun 6, 2023

1. pyspark.sql.SparkSession.read Returns a DataFrameReader https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.read.html#pyspark.sql.SparkSession.read 2. we check this DataFrameReader, it contains both "load" and "parquet" methods. 2.1. for load, load(path, format, schema) https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html#pyspark.sql.DataFrameReader.load Therefore, the answer is A or E. Typically parquet contains schema information. I do not like this question, because if reading a parquet file, directly use spark.read.parquet()

cookiemonster42Option: A
Jul 31, 2023

The parameters for load() function are: path, format, schema, **options A. Overall it makes sense, but do we really need to use schema? B. There is load operation, that's FALSE C. read is used without parenthesis, FALSE D. It should indeed, but there's no source parameter, FALSE E. That's true, but we need to put quotes for the filePath, then it's FALSE Makes it A, but the question is really strange and not clear.

cookiemonster42
Jul 31, 2023

UPD - parquet already has schema in it, it's not needed, then, I don't know what the answer is then

Ram459Option: E
Aug 15, 2023

Intention is to read a parquet at the file path filePath into a DataFrame

Singh_SumitOption: B
Sep 30, 2023

spark.read.load(PARQUET_PATH,format='parquet') Load is valid, if provided with format.

newusernameOption: E
Nov 7, 2023

I would go for E

juliom6Option: E
Nov 14, 2023

E is correct. The "format" parameter should be used instead of "source" (default "parquet"): https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html format: str, optional optional string for format of the data source. Default to ‘parquet’.