Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 63


You work for a large retailer and have been asked to segment your customers by their purchasing habits. The purchase history of all customers has been uploaded to BigQuery. You suspect that there may be several distinct customer segments, however you are unsure of how many, and you don’t yet understand the commonalities in their behavior. You want to find the most efficient solution. What should you do?

Show Answer
Correct Answer: A

Creating a k-means clustering model using BigQuery ML is the most efficient solution for segmenting customers by their purchasing habits. This approach allows for the automatic optimization of the number of clusters, which is ideal when the exact number of segments is unknown. K-means clustering is well-suited for identifying natural groupings or clusters within your data, helping to uncover commonalities in customer purchasing behavior efficiently and effectively.

Discussion

13 comments
Sign in to comment
VedjhaOption: A
Dec 7, 2022

Will go for 'A' as it is easy to build model in BQML where data is already present and optimization would be auto in case of K-mean algo

wish0035Option: A
Dec 16, 2022

ans: A, pretty sure. C, D => discarded, very time consuming. B => yes, you can identify similarities within each column, but when i read "you don’t yet understand the commonalities in their behavior" i understand that this job would be difficult, because there could be many columns to analyze, and i don't think that this would be efficient. A => BigQuery ML is compatible with kmeans clustering, it's easy and efficient to create, and i would automatically detect the number of clusters. Also from the BigQuery ML docs: "K-means clustering for data segmentation; for example, identifying customer segments." (Source: https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in)

M25Option: A
May 9, 2023

Went with A

LearnSodasOption: A
Dec 15, 2022

K-means is a good unsupervised learning algorithm to segment a population based on similarity We can usa K-means directly in BQ, so I think it's "the most efficient way" Labeling is not a good option since we don't really know what make a customer similar to another, and why dataprep if we can use directly BQ?

hiromiOption: A
Dec 16, 2022

A https://cloud.google.com/bigquery-ml/docs/kmeans-tutorial https://towardsdatascience.com/how-to-use-k-means-clustering-in-bigquery-ml-to-understand-and-describe-your-data-better-c972c6f5733b

CloudKidaOption: A
May 9, 2023

when to use k-means : Your data may contain natural groupings or clusters of data. You may want to identify these groupings descriptively in order to make data-driven decisions. For example, a retailer may want to identify natural groupings of customers who have similar purchasing habits or locations. This process is known as customer segmentation. https://cloud.google.com/bigquery/docs/kmeans-tutorial

MultiCloudIronManOption: A
Apr 1, 2024

K-means algorithm is used for grouping/clustering data in unsupervised learning experiments.

ares81Option: A
Jan 5, 2023

I correct myself. It's A: According to the documentation, if you omit the num_clusters option, BigQuery ML will choose a reasonable default based on the total number of rows in the training data.

tavva_prudhviOption: A
Mar 15, 2023

A This is the most efficient solution for segmenting customers based on their purchasing habits, as it utilizes BigQuery's built-in machine learning capabilities to identify distinct clusters of customers based on their purchasing behavior. By allowing BigQuery to automatically optimize the number of clusters, you can ensure that the model identifies the most appropriate number of segments based on the data, without having to manually select the number of clusters.

japojiOption: A
Dec 10, 2022

The question is about commonalities of clients by characteristics, no about characteristics by client. I mean with B you are looking for segments of the characteristics which define a client. But you need segments of clients defined by characteristics.

neochaoticOption: B
Dec 10, 2022

Its B! Dataprep provides Data profiling functionalities

ares81Option: B
Dec 13, 2022

It seems B, to me.

PhilipKokuOption: A
Jun 6, 2024

A) K-means is ideal for unsupervised clustering