Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 58


In what order should the below lines of code be run in order to write DataFrame storesDF to file path filePath as parquet and partition by values in column division?

Lines of code:

1. .write() \

2. .partitionBy("division") \

3. .parquet(filePath)

4. .storesDF \

5. .repartition("division")

6. .write \

7. .path(filePath, "parquet")

Show Answer
Correct Answer: C

To save a DataFrame as a partitioned parquet file by a specific column using PySpark, you should follow these steps in order: 1) Start with the DataFrame (.storesDF), 2) Use .write to initiate the write process, 3) Use .partitionBy to specify the column by which to partition the data, 4) Finally, use .parquet to write the DataFrame to the specified file path. Hence, the correct order is storesDF, .write, .partitionBy('division'), and .parquet(filePath).

Discussion

2 comments
Sign in to comment
newusernameOption: C
Nov 7, 2023

Correct

juliom6Option: C
Nov 14, 2023

C is correct: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriter.parquet.html