Professional Machine Learning Engineer Exam - Question 35

Question

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?

Examice · Accepted Answer

To automate a task in a Kubeflow pipeline, the easiest way is to use existing components whenever possible. By using the BigQuery Query Component from the Kubeflow Pipelines repository, you can seamlessly integrate BigQuery queries into your pipeline without the need to write additional custom code. This approach saves development time and ensures that you leverage well-tested and reusable components while enabling the results to be automatically passed to the next step in the pipeline.

maartenalexander · Answer

D. Kubeflow pipelines have different types of components, ranging from low- to high-level. They have a ComponentStore that allows you to access prebuilt functionality from GitHub.

NamitSehgal · Answer

Not sure what is the reason behind putting A as it is manual and manual steps can not be part of automation. I would say Answer is D as it just require a clone of the component from github. Using a Python and import bigquery component may sounds good too, but ask was what is easiest. It depends how word "easy" is taken by individuals but definitely not A.

chohan · Answer

Should be B

fragkris · Answer

Im going "against the flow" and chosing B. It just sounds a lot easier option than D.

PhilipKoku · Answer

B) Python API

kaike_reis · Answer

D. The easiest way possible in developer's world: copy code from stackoverflow or github hahaha. Jokes a part, I think D is the correct. (A) is manual, so you have to do always. (B) could be, but is not the easiest one because you need to write a script for this. (C) uses Kubeflow intern solution, but you need to work to create a custom component. (D) is the (C) solution, but easier using a component created previously to do the job.

aepos · Answer

The result of D is just the path to the Cloud Storage where the result is stored not the data itself. So the input to the next step is this path, where you still have to load the data? So i would guess B. Can anyone explain if i am wrong?

xiaoF · Answer

D is good.

David_ml · Answer

Answer is D.

friedi · Answer

Very confused as to why D is the correct answer. To me it seems a) much simpler to just write a couple of lines of python (https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python) and b) the documentation for the BigQuery reusable component (https://v0-5.kubeflow.org/docs/pipelines/reusable-components/) states that the data is written to Google Cloud Storage, which means we have to write the fetching logic in the next pipeline step, going against the "as simple as possible" requirement. Would be interested to hear why I am wrong.

Amabo · Answer

from kfp.components import load_component_from_url

bigquery_query_op = load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/gcp/bigquery/query/component.yaml')

def my_pipeline():
    query_result = bigquery_query_op(
        project_id='my-project',
        query='SELECT * FROM my_dataset.my_table'
    )
    # Use the query_result as input to the next step in the pipeline

celia20200410 · Answer

ans: c
https://medium.com/google-cloud/using-bigquery-and-bigquery-ml-from-kubeflow-pipelines-991a2fa4bea8
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#kubeflow-piplines-components
Kubeflow Pipelines, a containerized task can invoke other services such as BigQuery jobs, AI Platform (distributed) training jobs, and Dataflow jobs.

donchoripan · Answer

A. it says the easiest way possible so it sounds like just running the query on the console should be enogh. It doesn't says that the data will need to be uploaded again anytime soon, so we can asume that its just a one time query to be run.

Mohamed_Mossad · Answer

https://linuxtut.com/en/f4771efee37658c083cc/

M25 · Answer

Went with D

Professional Machine Learning Engineer Exam - Question 35

Discussion