Certified Associate Developer for Apache Spark Exam - Question 60

Question

In what order should the below lines of code be run in order to read a JSON file at the file path filePath into a DataFrame with the specified schema schema?

Lines of code:

1. .json(filePath, schema = schema)

2. .storesDF

3. .spark \

4. .read() \

5. .read \

6. .json(filePath, format = schema)

Examice · Accepted Answer

To read a JSON file into a DataFrame with a specified schema using Spark, you first need to initiate the Spark session using .spark, then use the read() method to create a DataFrameReader object, and finally, use the json method to read the file with the specified schema. Therefore, the lines of code should be run in the order 3. .spark, 4. .read(), 1. .json(filePath, schema = schema).

ZSun · Answer

storesDF = spark.read.json(filePath, schema = schema)
C

juliom6 · Answer

C is correct:

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.json.html

json function does not have a "format" parameter.

Jtic · Answer

2. .storesDF: This line is unrelated to reading the JSON file and can be disregarded.
.read(): This line invokes the DataFrameReader's read() method to create a DataFrameReader object.
.json(filePath, schema=schema): This line uses the DataFrameReader object to read the JSON file at the specified filePath into a DataFrame with the provided schema.

azure_bimonster · Answer

we use the following structure: spark.read.json(filePath, schema=schemaName)

Certified Associate Developer for Apache Spark Exam - Question 60

Discussion