Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 44


The code block shown below should create a single-column DataFrame from Python list years which is made up of integers. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

_1_._2_(_3_, _4_)

Show Answer
Correct Answer: DE

The correct way to create a single-column DataFrame from a Python list made up of integers in PySpark is by using the DataFrame API with the function spark.createDataFrame. Since 'years' is already a list, it should be passed directly along with the specified data type IntegerType() to ensure each element in the list is interpreted as an integer. The correct method call is spark.createDataFrame(years, IntegerType()).

Discussion

11 comments
Sign in to comment
peekaboo15Option: E
Apr 13, 2023

The answer should be E because Year is already a python list.

IndieeOption: E
Apr 25, 2023

Two responses 1. D is an error. E will split the array into rows 2. spark.createDataFrame([arraryVar_name],ArrayType(IntegerType())) will store the whole array as a row

Indiee
Apr 25, 2023

Agreed

zozoshankyOption: E
Jul 30, 2023

D throws a big error. /usr/local/spark/python/pyspark/sql/types.py in verify_acceptable_types(obj) 1291 # subclass of them can not be fromInternal in JVM 1292 if type(obj) not in _acceptable_types[_type]: -> 1293 raise TypeError(new_msg("%s can not accept object %r in type %s" 1294 % (dataType, obj, type(obj)))) 1295 TypeError: field value: IntegerType can not accept object [1, 2, 3, 4, 5] in type <class 'list'> E is correct answer from pyspark.sql.types import IntegerType a = [1,2,3,4,5] spark.createDataFrame(a, IntegerType()).show()

singh100Option: E
Aug 1, 2023

E. D is giving an error .

cookiemonster42Option: E
Aug 3, 2023

if years is variable, it works, just tested it: years = [1, 3, 4, 5 , 9] df7 = spark.createDataFrame(years, IntegerType()) df7.show() this works as well: df7 = spark.createDataFrame([1, 3, 4, 5 , 9], IntegerType()) df7.show() this won't work: df7 = spark.createDataFrame([years], IntegerType()) df7.show() so, the answer is E

thanabOption: E
Sep 16, 2023

1. spark 2. createDataFrame 3. years 4. IntegertType()

juadavesOption: D
Oct 19, 2023

D from pyspark.sql.types import IntegerType spark.createDataFrame([1991,2023],IntegerType()).show() +-----+ |value| +-----+ | 1991| | 2023| +-----+

carlosmps
Jun 22, 2024

it's E. years is already a list

juliom6Option: E
Nov 2, 2023

E is correct: from pyspark.sql.types import IntegerType years = [2023, 2024] print(type(years)) storesDF = spark.createDataFrame(years, IntegerType()) storesDF.show() <class 'list'> +-----+ |value| +-----+ | 2023| | 2024| +-----+

mahmoud_salah30Option: E
Dec 31, 2023

e is the right

znetsOption: E
Feb 20, 2024

E is the most suitable, but it also contains an error. In PySpark, the correct class name for the integer data type is IntegerType (not "IntegertType").