Can anyone explain to me advantages and disadvantages of classification SVM that distinguishes it from other classifiers?
There are four main advantages: Firstly it has a regularisation parameter, which makes the user think about avoiding over-fitting. Secondly it uses the kernel trick, so you can build in expert knowledge about the problem via engineering the kernel. Thirdly an SVM is defined by a convex optimisation problem (no local minima) for which there are efficient methods (e.g. SMO). Lastly, it is an approximation to a bound on the test error rate, and there is a substantial body of theory behind it which suggests it should be a good idea.
The disadvantages are that the theory only really covers the determination of the parameters for a given value of the regularisation and kernel parameters and choice of kernel. In a way the SVM moves the problem of over-fitting from optimising the parameters to model selection. Sadly kernel models can be quite sensitive to over-fitting the model selection criterion, see
G. C. Cawley and N. L. C. Talbot, Over-fitting in model selection and subsequent selection bias in performance evaluation, Journal of Machine Learning Research, 2010. Research, vol. 11, pp. 2079-2107, July 2010. (pdf)
Note however this problem is not unique to kernel methods, most machine learning methods have similar problems. The hinge loss used in the SVM results in sparsity. However, often the optimal choice of kernel and regularisation parameters means you end up with all data being support vectors. If you really want a sparse kernel machine, use something that was designed to be sparse from the outset (rather than being a useful byproduct), such as the Informative Vector Machine. The loss function used for support vector regression doesn’t have an obvious statistical intepretation, often expert knowledge of the problem can be encoded in the loss function, e.g. Poisson or Beta or Gaussian. Likewise in many classification problems you actually want the probability of class membership, so it would be better to use a method like Kernel Logistic Regression, rather than post-process the output of the SVM to get probabilities.
That is about all I can think of off-hand.