Mixing continuous and binary data with linear SVM?

So I’ve been playing around with SVMs and I wonder if this is a good thing to do:

I have a set of continuous features (0 to 1) and a set of categorical features that I converted to dummy variables. In this particular case, I encode the date of the measurement in a dummy variable:

There are 3 periods that I have data from and I reserved 3 feature numbers for them:


So depending on which period the data comes from, different features will get 1 assigned; the others will get 0.

Will the SVM work properly with this or this is a bad thing to do?

I use SVMLight and a linear kernel.


SVMs will handle both binary and continuous variables as long as you make some preprocessing: all features should be scaled or normalised. After that step, from the algorithms’ perspective it doesn’t matter if features are continuous or binary: for binaries, it sees samples that are either “far” away, or very similar; for continuous there are also the in between values. Kernel doesn’t matter in respect to the type of variables.

Source : Link , Question Author : user3010273 , Answer Author : iliasfl

Leave a Comment