Feature Engineering

Hashing

Feature hashing is used to convert a categorical feature into a small-dimension feature space.

For example a vocabulary of words can be represented in a 5 dimensional space \(\{0,1\}^5\) (e.g. Apple → [1,0,1,1,0]).

Hashing introduces noise as collisions are permitted (Car→ [0,0,0,1,1], Flight→ [0,0,0,1,1]).

Embedding

With embedding the representation is more dense (e.g. Car → [0.4,0.9,0.1,0,0.1])

Bucketing

Bucketing is a technique for decomposing feature values into buckets.

Crossing

Crossing features is combining multiple features in one feature.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code