Exam Professional Data Engineer All QuestionsBrowse all questions from this exam
Question 89

You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.

What should you do?

    Correct Answer: D

    B

    Reference:

    https://cloud.google.com/bigquery/docs/gis-data

Discussion
dish11dishOption: C

Option C is correct Use L1 regularization when you need to assign greater importance to more influential features. It shrinks less important feature to 0. L2 regularization performs better when all input features influence the output & all with the weights are of equal size.

AHUIOption: C

Ans C, use L1 regularization becuase we know the feature is a strong feature. L2 will evenly distribute weights

AWSandeepOption: C

C. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.

ckanaar

What does bucketizing at the minute level mean in the context of this question?

Surely1987

Coordinates are written with Degrees, minutes and seconds (one minute being equal to about 1.8 km). So you group your coordinates in buckets with a miute precision

zellckOption: C

C is the answer. https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture A feature cross is a synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually. https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization

Oleksandr0501

gpt: Option C and D suggest bucketizing the feature cross of latitude and longitude at the minute level and using L1 or L2 regularization during optimization. While regularization can help prevent overfitting, bucketizing at such a granular level may not be necessary and could lead to overfitting. It's also not clear how bucketizing at the minute level would capture the spatial relationship between the latitude and longitude features.

crismo04

https://medium.com/riga-data-science-club/geographic-coordinate-encoding-with-tensorflow-feature-columns-e750ae338b7c#:~:text=to%20the%20rescue!-,Feature%20Crosses,-Combining%20features%20into

crismo04

Feature cross seems to be the right feature option

crismo04

So it's B option

SnnnnneeeOption: B

Bucketing into minutes is inaccurate, up to 1.8 km are grouped. Way too much for real estste. Therefore B

uday_examtopicOption: D

Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization. Like option C, we bucketize at the minute level, but this time we apply L2 regularization. L2 regularization, or Ridge Regression, discourages large values of weights in the model without forcing them to become sparse. It can help prevent overfitting, especially when we have a large number of features (as a result of bucketizing and crossing). Given the options, D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization seems to be the most appropriate. Bucketizing at the minute level captures localized patterns, and L2 regularization can help control the complexity of the model without enforcing sparsity.

FP77Option: B

I strongly believe it's B.

Mathew106Option: B

The right answer is B. What the hell does bucketize the feature cross of latitude and longtitude even mean? They are not a time feature. C and D don't even make sense. The L1 regularization is something that doesn't answer anything in the question. The only valid feature engineered here is option B. A is not an engineered feature. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimization.

Jojo9400Option: D

D You have to use L2, since you have create a new variable with two already existing the risk of multicollinearity is high, L1 is good for selecting feature to avoid curse of dimensionality not for multicollinearity

ga8ourOption: C

Why not L2? L2 (Ridge) uses a squared value coefficient as a penalty term to the loss function, while L1 (Lasso) uses an absolute value coefficient. Isn't a squared penalty stronger than an absolute one? https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c

ckanaar

L1 regression forces unimportant coefficients to zero. Since the location is extremely important, L1 will force less important coefficients to zero, thereby further increasing the importance of the location coefficient.

PolyMoeOption: D

D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization. This will create a new feature that captures the physical dependency of the location of the property on the price, and bucketing it at the minute level will reduce the number of unique values and prevent overfitting. L2 regularization will also help to prevent overfitting by penalizing large weights in the model.

cetanx

chat-gpt also says D explanation: This approach effectively creates a grid of the geographical area in your data, allowing the model to learn weights for each grid cell (bucket). This helps capture the spatial relationship between latitude and longitude, which can be crucial for real estate prices. Additionally, using L2 regularization helps prevent overfitting by discouraging complex models, which can be particularly important when working with high-dimensional crossed features.

[Removed]Option: C

Regularization + location into one

nwk

C or D? https://medium.com/riga-data-science-club/geographic-coordinate-encoding-with-tensorflow-feature-columns-e750ae338b7c