Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 110


Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns column1 and column2?

Show Answer
Correct Answer: B

To perform an inner join using two key columns in DataFrame.join(), you typically specify the columns either by using column expressions or simply by providing a sequence of column names. However, when using the 'usingColumns' argument, it should be a sequence of column names, not column expressions. Therefore, specifying 'usingColumns = Seq(col(“column1”), col(“column2”))' is incorrect. Correct usage would be 'usingColumns = Seq(“column1”, “column2”)'. This makes option B incorrect and thus the answer.

Discussion

2 comments
Sign in to comment
Sowwy1Option: B
Apr 2, 2024

I think it s B

Alucard069Option: D
May 1, 2024

Will option D work? I have never seen this way of accessing columns in a dataframe : df("column") , it should be either df.column or col("a.column") [ considering a as an alias of df]