Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 55

The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF. Identify the error.

Code block:

storesDF.join(employeesDF, "cross")

    Correct Answer: C

    The error in the code block is that a cross join is not implemented by the DataFrame.join() operation. Instead, the DataFrame.crossJoin() operation should be used. The correct method to perform a cross join in Spark is to use the crossJoin() method provided by the DataFrame class, which is specifically designed for this purpose.

Discussion
ronfunOption: D

Key is missing. Answer is D.

4be8126

No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.

ZSun

completely wrong. join(other, on=None, how=None) Joins with another DataFrame, using the given join expression. [source] Parameters: other – Right side of the join on – a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join. how – str, default inner. Must be one of: inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti.

ZSun

you can specify cross in dataframe.join( how = 'cross') the reason why this code block doesn't work, because the second parameter is on. You need to specify the key column and then use how = 'cross'. otherwise, the function will regard 'cross' for 'on' instead of 'how'

newusername

ZSun is as always right. 4be8126 - it is not a problem to use gpt, but check its answers. Otherwise do not post it anywhere.

newusernameOption: D

I know it looks confusing to have key column for cross join, but it ijoin method syntaxis: https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.join.html see example below : dataA = [Row(column1=1, column2=2), Row(column1=2, column2=4), Row(column1=3, column2=6)] dfA = spark.createDataFrame(dataA) # Sample data for DataFrame 'b' dataB = [Row(column1=1, column2=2), Row(column1=2, column2=5), Row(column1=3, column2=4)] dfB = spark.createDataFrame(dataB) joinedDF = dfA.join(dfB, on=None, how="cross") joinedDF.show() it is possible to do Cross join this way as well DataFrame.crossJoin() but answer C states that df.join () doesn't do cross, which is wrong.

azure_bimonsterOption: D

D is the answer here as key is missing. As per syntax, key is needed.

juliom6Option: C

C is correct. # https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html a = spark.createDataFrame([(1, 2), (3, 4)], ['column1', 'column2']) b = spark.createDataFrame([(5, 6), (7, 8)], ['column3', 'column4']) df = a.crossJoin(b) display(df)

4be8126Option: C

C. A cross join is not implemented by the DataFrame.join()operation – the DataFrame.crossJoin()operation should be used instead.

peekaboo15Option: C

cross join doesn't need a key. Answer is C

4be8126

No, the issue is not that the key column is missing. In a cross join, there is no key column to join on. The correct answer is C: a cross join is not implemented by the DataFrame.join() operation – the DataFrame.crossJoin() operation should be used instead.

AhloOption: C

Correct answer C from pyspark.sql import Row df = spark.createDataFrame( [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) df2 = spark.createDataFrame( [Row(height=80, name="Tom"), Row(height=85, name="Bob")]) df.crossJoin(df2.select("height")).select("age", "name", "height").show() https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.crossJoin.html