Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 24


The code block shown below should return a new DataFrame from DataFrame storesDF where column modality is the constant string "PHYSICAL", Assume DataFrame storesDF is the only defined language variable. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

storesDF. _1_(_2_,_3_(_4_))

Show Answer
Correct Answer: C

To return a new DataFrame from DataFrame storesDF where the column modality is set to the constant string 'PHYSICAL', we need to use the withColumn function to create or replace a column with a specified value. The column name 'modality' should be specified next. To assign a constant value to the column, the lit function is used to create a Column with a literal value. In this case, the literal value is the string 'PHYSICAL', which should be wrapped in quotation marks. Therefore, the correct option completes the code block as storesDF.withColumn('modality', lit('PHYSICAL')).

Discussion

3 comments
Sign in to comment
4be8126Option: C
Apr 26, 2023

Option C is the correct answer. Here's why: The withColumn function is used to add a new column to the DataFrame based on an existing column or a constant value. The first blank (_1_) should be replaced with withColumn to indicate that we want to add a new column. The second blank (_2_) should be replaced with the name of the column we want to add. In this case, we want to add a column called modality. The third blank (_3_) should be replaced with a function that will create the values for the new column. In this case, we want to create a column that has the constant value "PHYSICAL". The lit function can be used to create a column with a literal value. Finally, the fourth blank (_4_) should be replaced with the actual value we want to use for the new column. Since we want to use the string "PHYSICAL", it should be wrapped in quotation marks to indicate that it is a string. Therefore, option C correctly fills in the blanks to give us the following code block: storesDF.withColumn("modality", lit("PHYSICAL"))

4be8126Option: C
Apr 26, 2023

lit and col are two functions in PySpark that are used to create or reference columns in a DataFrame. lit: This function is used to create a column with a literal value. It returns a Column expression of literal value. For example, lit(2) creates a Column with a value of 2. It can be useful when you want to add a new column to a DataFrame with a constant value for all rows. col: This function is used to reference an existing column in a DataFrame. It returns a Column expression that represents a column. For example, col("age") returns a Column expression that represents the "age" column in a DataFrame. It can be useful when you want to select, filter or transform an existing column in a DataFrame. In short, lit is used to create a new column with a constant value, while col is used to reference an existing column in a DataFrame.

newusernameOption: C
Sep 11, 2023

Correct