You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest, most efficient approach. What should you do?
You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest, most efficient approach. What should you do?
To predict house prices using an AutoML model and a small public dataset stored in BigQuery, the simplest and most efficient approach is to preprocess the data within BigQuery itself by writing a query and creating a new table. Then, create a Vertex AI managed dataset with this new table as the data source. This method leverages BigQuery’s data processing capabilities, keeping the data within the same environment, reducing data movement, and simplifying the workflow.
A seems the easiest to me: preprocess the data on BigQuery (where the input table is stored) and export directly as Vertex AI managed dataset.
I go for A:
A By writing a query that preprocesses the data using BigQuery and creating a new table, you can directly create a Vertex AI managed dataset with the new table as the data source. This approach is efficient because it leverages BigQuery’s powerful data processing capabilities and avoids the need to export data to another format or service. It also simplifies the process by keeping everything within the Google Cloud ecosystem. This makes it easier to manage and monitor your data and model training process.
A) Keep the data in BigQuery and create a new table to avoid latency moving data out of BigQuery
Dataflow seems like the easiest and most scalable way to deal with this issue. Option B.
small dataset -> no dataflow
Forgot to vote
can export directly from big query as vertex ai managed dataset to use train an autoML model
I go for A:
A seems the correct one