Does Support Vector Machine handle imbalanced Dataset?

Does SVM handles imbalanced dataset? Is that any parameters (like C, or misclassification cost) handling the imbalanced dataset?


For imbalanced data sets we typically change the misclassification penalty per class. This is called class-weighted SVM, which minimizes the following:

\min_{\mathbf{w},b,\xi} &\quad \sum_{i=1}^N\sum_{j=1}^N \alpha_i \alpha_j y_i y_j \kappa(\mathbf{x}_i,\mathbf{x}_j) + C_{pos}\sum_{i\in \mathcal{P}} \xi_i + C_{neg}\sum_{i\in \mathcal{N}}\xi_i, \\
s.t. &\quad y_i\big(\sum_{j=1}^N \alpha_j y_j \kappa(\mathbf{x}_i, \mathbf{x}_j) + b\big) \geq 1-\xi_i,& i=1\ldots N \\
&\quad \xi_i \geq 0, & i=1\ldots N

where \mathcal{P} and \mathcal{N} represent the positive/negative training instances. In standard SVM we only have a single C value, whereas now we have 2. The misclassification penalty for the minority class is chosen to be larger than that of the majority class.

This approach was introduced quite early, it is mentioned for instance in a 1997 paper:

Edgar Osuna, Robert Freund, and Federico Girosi. Support Vector Machines: Training and Applications. Technical Report AIM-1602, 1997. (pdf)

Essentially this is equivalent to oversampling the minority class: for instance if C_{pos} = 2 C_{neg} this is entirely equivalent to training a standard SVM with C=C_{neg} after including every positive twice in the training set.

Source : Link , Question Author : RockTheStar , Answer Author : gung – Reinstate Monica

Leave a Comment