Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 60

In what order should the below lines of code be run in order to read a JSON file at the file path filePath into a DataFrame with the specified schema schema?

Lines of code:

1. .json(filePath, schema = schema)

2. .storesDF

3. .spark \

4. .read() \

5. .read \

6. .json(filePath, format = schema)

    Correct Answer: E

    To read a JSON file into a DataFrame with a specified schema using Spark, you first need to initiate the Spark session using .spark, then use the read() method to create a DataFrameReader object, and finally, use the json method to read the file with the specified schema. Therefore, the lines of code should be run in the order 3. .spark, 4. .read(), 1. .json(filePath, schema = schema).

Discussion
ZSunOption: C

storesDF = spark.read.json(filePath, schema = schema) C

juliom6Option: C

C is correct: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.json.html json function does not have a "format" parameter.

azure_bimonsterOption: C

we use the following structure: spark.read.json(filePath, schema=schemaName)

JticOption: B

2. .storesDF: This line is unrelated to reading the JSON file and can be disregarded. .read(): This line invokes the DataFrameReader's read() method to create a DataFrameReader object. .json(filePath, schema=schema): This line uses the DataFrameReader object to read the JSON file at the specified filePath into a DataFrame with the provided schema.

pnev

This is so wrong.. in order to read a table you need to use spark.read.json / parquet / Table.