Professional Machine Learning Engineer Exam - Question 63

Question

You work for a large retailer and have been asked to segment your customers by their purchasing habits. The purchase history of all customers has been uploaded to BigQuery. You suspect that there may be several distinct customer segments, however you are unsure of how many, and you don’t yet understand the commonalities in their behavior. You want to find the most efficient solution. What should you do?

Examice · Accepted Answer

Creating a k-means clustering model using BigQuery ML is the most efficient solution for segmenting customers by their purchasing habits. This approach allows for the automatic optimization of the number of clusters, which is ideal when the exact number of segments is unknown. K-means clustering is well-suited for identifying natural groupings or clusters within your data, helping to uncover commonalities in customer purchasing behavior efficiently and effectively.

Vedjha · Answer

Will go for 'A' as it is easy to build model in BQML where data is already present and optimization would be auto in case of K-mean algo

wish0035 · Answer

ans: A, pretty sure.

C, D => discarded, very time consuming.
B => yes, you can identify similarities within each column, but when i read "you don’t yet understand the commonalities in their behavior" i understand that this job would be difficult, because there could be many columns to analyze, and i don't think that this would be efficient.

A => BigQuery ML is compatible with kmeans clustering, it's easy and efficient to create, and i would automatically detect the number of clusters.

Also from the BigQuery ML docs: "K-means clustering for data segmentation; for example, identifying customer segments."
(Source: https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in)

M25 · Answer

Went with A

LearnSodas · Answer

K-means is a good unsupervised learning algorithm to segment a population based on similarity

We can usa K-means directly in BQ, so I think it's "the most efficient way"

Labeling is not a good option since we don't really know what make a customer similar to another, and why dataprep if we can use directly BQ?

hiromi · Answer

A
https://cloud.google.com/bigquery-ml/docs/kmeans-tutorial
https://towardsdatascience.com/how-to-use-k-means-clustering-in-bigquery-ml-to-understand-and-describe-your-data-better-c972c6f5733b

CloudKida · Answer

when to use k-means : Your data may contain natural groupings or clusters of data. You may want to identify these groupings descriptively in order to make data-driven decisions. For example, a retailer may want to identify natural groupings of customers who have similar purchasing habits or locations. This process is known as customer segmentation.
https://cloud.google.com/bigquery/docs/kmeans-tutorial

MultiCloudIronMan · Answer

K-means algorithm is used for grouping/clustering data in unsupervised learning experiments.

ares81 · Answer

I correct myself. It's A:
According to the documentation, if you omit the num_clusters option, BigQuery ML will choose a reasonable default based on the total number of rows in the training data.

tavva_prudhvi · Answer

A
This is the most efficient solution for segmenting customers based on their purchasing habits, as it utilizes BigQuery's built-in machine learning capabilities to identify distinct clusters of customers based on their purchasing behavior. By allowing BigQuery to automatically optimize the number of clusters, you can ensure that the model identifies the most appropriate number of segments based on the data, without having to manually select the number of clusters.

japoji · Answer

The question is about commonalities of clients by characteristics, no about characteristics by client. I mean with B you are looking for segments of the characteristics which define a client. But you need segments of clients defined by characteristics.

neochaotic · Answer

Its B!  Dataprep provides Data profiling functionalities

ares81 · Answer

It seems B, to me.

PhilipKoku · Answer

A) K-means is ideal for unsupervised clustering

Professional Machine Learning Engineer Exam - Question 63

Discussion