Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 55


The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF. Identify the error.

Code block:

storesDF.join(employeesDF, "cross")

Show Answer
Correct Answer: C

The error in the code block is that a cross join is not implemented by the DataFrame.join() operation. Instead, the DataFrame.crossJoin() operation should be used. The correct method to perform a cross join in Spark is to use the crossJoin() method provided by the DataFrame class, which is specifically designed for this purpose.

Discussion

7 comments
Sign in to comment
ronfunOption: D
Apr 9, 2023

Key is missing. Answer is D.

4be8126
May 3, 2023

No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.

ZSun
Jun 6, 2023

completely wrong. join(other, on=None, how=None) Joins with another DataFrame, using the given join expression. [source] Parameters: other – Right side of the join on – a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. how – str, default inner. Must be one of: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti.

ZSun
Jun 6, 2023

you can specify cross in dataframe.join( how = 'cross') the reason why this code block doesn't work, because the second parameter is on. You need to specify the key column and then use how = 'cross'. otherwise, the function will regard 'cross' for 'on' instead of 'how'

newusername
Nov 7, 2023

ZSun is as always right. 4be8126 - it is not a problem to use gpt, but check its answers. Otherwise do not post it anywhere.

newusernameOption: D
Nov 7, 2023

I know it looks confusing to have key column for cross join, but it ijoin method syntaxis: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.join.html see example below : dataA = [Row(column1=1, column2=2), Row(column1=2, column2=4), Row(column1=3, column2=6)] dfA = spark.createDataFrame(dataA) # Sample data for DataFrame 'b' dataB = [Row(column1=1, column2=2), Row(column1=2, column2=5), Row(column1=3, column2=4)] dfB = spark.createDataFrame(dataB) joinedDF = dfA.join(dfB, on=None, how="cross") joinedDF.show() it is possible to do Cross join this way as well DataFrame.crossJoin() but answer C states that df.join () doesn't do cross, which is wrong.

peekaboo15Option: C
Apr 13, 2023

cross join doesn't need a key. Answer is C

4be8126
May 3, 2023

No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.

4be8126Option: C
May 3, 2023

C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.

juliom6Option: C
Nov 14, 2023

C is correct. # https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2']) b = spark.createDataFrame([(5, 6), (7, 8)], ['column3', 'column4']) df = a.crossJoin(b) display(df)

azure_bimonsterOption: D
Feb 8, 2024

D is the answer here as key is missing. As per syntax, key is needed.

AhloOption: C
Feb 26, 2024

Correct answer C from pyspark.sql import Row df = spark.createDataFrame( [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) df2 = spark.createDataFrame( [Row(height=80, name="Tom"), Row(height=85, name="Bob")]) df.crossJoin(df2.select("height")).select("age", "name", "height").show() https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.crossJoin.html