So I’ve been playing around with SVMs and I wonder if this is a good thing to do:
I have a set of continuous features (0 to 1) and a set of categorical features that I converted to dummy variables. In this particular case, I encode the date of the measurement in a dummy variable:
There are 3 periods that I have data from and I reserved 3 feature numbers for them:
So depending on which period the data comes from, different features will get 1 assigned; the others will get 0.
Will the SVM work properly with this or this is a bad thing to do?
I use SVMLight and a linear kernel.
SVMs will handle both binary and continuous variables as long as you make some preprocessing: all features should be scaled or normalised. After that step, from the algorithms’ perspective it doesn’t matter if features are continuous or binary: for binaries, it sees samples that are either “far” away, or very similar; for continuous there are also the in between values. Kernel doesn’t matter in respect to the type of variables.