Exam Certified Machine Learning Professional All QuestionsBrowse all questions from this exam
Question 41

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.

Which of the following code blocks can they use to create a function called predict that they can use to complete the task?

    Correct Answer: E

    To perform batch inference using a scikit-learn model stored with MLflow on a Spark DataFrame, one must load the model as a Spark UDF (User-Defined Function) and apply it to the DataFrame. This can be achieved by using the 'mlflow.pyfunc.spark_udf' function which requires the Spark session and the model URI. The correct procedure involves creating the UDF with 'mlflow.pyfunc.spark_udf', then utilizing it to transform the given DataFrame. Consequently, this approach ensures that the model is applied correctly within the Spark environment.

Discussion
spaceexplorerOption: E

E is correct

BokNinjaOption: E

E. import mlflow logged_model = 'runs:/e905f5759d434a131bbe1e54a2b/best-model' # Load model as a Spark UDF. loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model) # Predict on a Spark DataFrame. df.withColumn('predictions', loaded_model(*columns)).collect()

victorcolome

Must be A, not E, as the question states that the variable is called "spark_df".

victorcolome

My bad, it is E. Because the spark_udf function expects the SparkSession as first paramenter, not the DataFrame!

MircuzOption: E

You need the spark env

64934caOption: E

The spark session is passed as the first argument to mlflow.pyfunc.spark_udf to provide the necessary context for creating and executing the UDF within the Spark environment. The model_uri is passed as the second argument to specify which MLflow model to load and use for predictions. This order is required by the function's design to ensure proper integration with Spark.

JaydeepTOption: A

spark_df is the frame to be used for variable evaluation in runtime