Professional Data Engineer Exam - Question 89

Question

You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.

What should you do?

Examice · Accepted Answer

B
Reference:https://cloud. google. com/bigquery/docs/gis-data.

AHUI · Answer

Ans C, use L1 regularization becuase we know the feature is a strong feature.  L2 will evenly distribute weights

dish11dish · Answer

Option C is correct

Use L1 regularization when you need to assign greater importance to more influential features. It
shrinks less important feature to 0.
L2 regularization performs better when all input features influence the output & all with the
weights are of equal size.

AWSandeep · Answer

C. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.

zellck · Answer

C is the answer.

https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
A feature cross is a synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.

https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization

ckanaar · Answer

What does bucketizing at the minute level mean in the context of this question?

crismo04 · Answer

https://medium.com/riga-data-science-club/geographic-coordinate-encoding-with-tensorflow-feature-columns-e750ae338b7c#:~:text=to%20the%20rescue!-,Feature%20Crosses,-Combining%20features%20into

Oleksandr0501 · Answer

gpt: Option C and D suggest bucketizing the feature cross of latitude and longitude at the minute level and using L1 or L2 regularization during optimization. While regularization can help prevent overfitting, bucketizing at such a granular level may not be necessary and could lead to overfitting. It's also not clear how bucketizing at the minute level would capture the spatial relationship between the latitude and longitude features.

nwk · Answer

C or D?
https://medium.com/riga-data-science-club/geographic-coordinate-encoding-with-tensorflow-feature-columns-e750ae338b7c

[Removed] · Answer

Regularization + location into one

PolyMoe · Answer

D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization. This will create a new feature that captures the physical dependency of the location of the property on the price, and bucketing it at the minute level will reduce the number of unique values and prevent overfitting. L2 regularization will also help to prevent overfitting by penalizing large weights in the model.

ga8our · Answer

Why not L2? L2 (Ridge) uses a squared value coefficient as a penalty term to the loss function, while L1 (Lasso) uses an absolute value coefficient. Isn't a squared penalty stronger than an absolute one? 
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c

Jojo9400 · Answer

D

You have to use L2, since you have create a new variable with two already existing the risk of multicollinearity is high, L1 is good for selecting feature to avoid curse of dimensionality not for multicollinearity

Mathew106 · Answer

The right answer is B. What the hell does bucketize the feature cross of latitude and longtitude even mean? They are not a time feature. C and D don't even make sense. The L1 regularization is something that doesn't answer anything in the question. The only valid feature engineered here is option B. A is not an engineered feature.

Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.

FP77 · Answer

I strongly believe it's  B.

uday_examtopic · Answer

Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization.

Like option C, we bucketize at the minute level, but this time we apply L2 regularization. L2 regularization, or Ridge Regression, discourages large values of weights in the model without forcing them to become sparse. It can help prevent overfitting, especially when we have a large number of features (as a result of bucketizing and crossing).

Given the options, D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization seems to be the most appropriate. Bucketizing at the minute level captures localized patterns, and L2 regularization can help control the complexity of the model without enforcing sparsity.

Snnnnneee · Answer

Bucketing into minutes is inaccurate, up to 1.8 km are grouped. Way too much for real estste.
Therefore B

Professional Data Engineer Exam - Question 89

Discussion