My dataset is made of a label, $y_{t}$, which is the dependent variable, and about 20 columns of independent numeric variables, $X_{t}$, $t=1,2,…,T$.
These samples are time series and my goal is to classify $y_{t}$ according to $X_{t}$.
The dependent variable can get just two labels: “$0$” or “$1$”.
Probability of belonging to “$0$” or “$1$” label is not needed, although it could bring further value to the analysis.
I would like to know which one of the following methods is the best one for my case and why (eventually how should I set methods and parameters if needed):
 Support Vector Machines: which kernel should I use (linear, polynomial, radial basis, sigmoid)?
 Neural Networks: how many layer and nodes should I set?
 Random Forests
 Non parametric model applied to binary outcome (this provides probabilities of belonging to each class)
What can you suggest me?
Answer
These decisions IMHO can only be made in a sensible way with intimate knowledge about the problem and the data at hand (search terms: no free lunch theorem for pattern recognition/classification). So all we can tell you here are very general rules of thumb.

The more statistically independent cases you have for training, the more complex models you can afford. Very restrictive models (e.g. linear) are very often chosen because more complex models cannot be afforded with the given amount of data and less about really being convinced of having actually linear class boundaries.
See bias variance tradeoff and model complexity e.g. in The Elements of Statistical Learning 
knowledge about the nature of your problem and data may also suggest sensible ways of feature generation.

If you don’t have terribly many samples, but absolutely need nonlinear boundaries and therefore get unstable models, then ensemble models (like the random Forest) can help. You can aggregate not only decision trees but all other kinds of models as well.

There are rumours* that for the final quality of the model the choice of model often matters less than the experience the user has with the chosen type of model. I try to collect some evidence about this rumour in this question.
The conclusion would be to look for someone to consult who has experience with the classifiers you consider or, even better, with classification of your type of data (that would need a more detailed description than just saying it is time series).
Note: the first three can also be set up to output posterior probabilities.
*I don’t know any scientific study that reports this, but have heard numerous people reporting this observation, and there is a number of descriptions of the differences between types of models that at the end conclude that the theoretical differences in practice hardly ever matter.
Attribution
Source : Link , Question Author : Lisa Ann , Answer Author : Community