Certified Associate Developer for Apache Spark Exam - Question 44

Question

The code block shown below should create a single-column DataFrame from Python list years which is made up of integers. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

_1_._2_(_3_, _4_)

Examice · Accepted Answer

The correct way to create a single-column DataFrame from a Python list made up of integers in PySpark is by using the DataFrame API with the function spark.createDataFrame. Since 'years' is already a list, it should be passed directly along with the specified data type IntegerType() to ensure each element in the list is interpreted as an integer. The correct method call is spark.createDataFrame(years, IntegerType()).

peekaboo15 · Answer

The answer should be E because Year is already a python list.

Indiee · Answer

Two responses
1. D is an error. E will split the array into rows
2. spark.createDataFrame([arraryVar_name],ArrayType(IntegerType())) will store the whole array as a row

Indiee · Answer

Agreed

zozoshanky · Answer

D throws a big error.
/usr/local/spark/python/pyspark/sql/types.py in verify_acceptable_types(obj)
   1291         # subclass of them can not be fromInternal in JVM
   1292         if type(obj) not in _acceptable_types[_type]:
-> 1293             raise TypeError(new_msg("%s can not accept object %r in type %s"
   1294                                     % (dataType, obj, type(obj))))
   1295

TypeError: field value: IntegerType can not accept object [1, 2, 3, 4, 5] in type <class 'list'>

E is correct answer

from pyspark.sql.types import IntegerType
a = [1,2,3,4,5]
spark.createDataFrame(a, IntegerType()).show()

singh100 · Answer

E. D is giving an error .

cookiemonster42 · Answer

if years is variable, it works, just tested it: years = [1, 3, 4, 5 , 9]
df7 = spark.createDataFrame(years, IntegerType())
df7.show()

this works as well: df7 = spark.createDataFrame([1, 3, 4, 5 , 9], IntegerType())
df7.show()

this won't work:
df7 = spark.createDataFrame([years], IntegerType())
df7.show()

so, the answer is E

thanab · Answer

1. spark
2. createDataFrame
3. years
4. IntegertType()

juadaves · Answer

D

from pyspark.sql.types import IntegerType
spark.createDataFrame([1991,2023],IntegerType()).show()

+-----+
|value|
+-----+
| 1991|
| 2023|
+-----+

juliom6 · Answer

E is correct:

from pyspark.sql.types import IntegerType
years = [2023, 2024]
print(type(years))
storesDF = spark.createDataFrame(years, IntegerType())
storesDF.show()

<class 'list'>
+-----+
|value|
+-----+
| 2023|
| 2024|
+-----+

mahmoud_salah30 · Answer

e is the right

znets · Answer

E is the most suitable, but it also contains an error.

In PySpark, the correct class name for the integer data type is IntegerType (not "IntegertType").

Certified Associate Developer for Apache Spark Exam - Question 44

Discussion