We are studying machine learning via Machine Learning: A Probabilistic Perspective (Kevin Murphy). While the text explains the theoretical foundation of each algorithm, it rarely says in which case which algorithm is better, and when it does, it doesn’t say how to tell which case I’m in.

For example, for the choice of kernel, I’ve been told to do exploratory data analysis to gauge how complex my data is. In simple 2 dimensional data, I can plot and see whether a linear or radial kernel is appropriate. But what to do in higher dimension?

More generally, what do people mean when they say “get to know your data” before choosing an algorithm? Right now I can only distinguish classification vs regression algorithm, and linear vs non-linear algorithm (which I can’t check).

EDIT: Even though my original question is about universal rule of thumb, I’ve been asked to provide more info on my particular problem.

Data: A panel with each row being a country-month (~30,000 rows total, covering ~165 countries over ~15 years).

Response: 5 binary variables of interest (i.e. whether protest / coup / crisis, etc. happen in that month).

Features: ~ 400 variables (a mix of continuous, categorical, binary) detailing a bunch of characteristic of the 2 previous country-months (longer lag can be created). We only use lagged variable since the goal is prediction.

Examples include, exchange rate, GDP growth (continuous), level of free press (categorical), democracy, whether neighbor having conflict (binary). Note that a lot of these 400 features are lagged variables.

**Answer**

This is a broad question without a simple answer. At CMU I taught a 3-month course on this topic. It covered issues such as:

- Using projections to understand correlation between variables and overall distributional structure.
- How to build up a regression model by successively modelling residuals.
- Determining when to add nonlinear interaction terms to a linear model.
- How to decide between knn vs. a decision tree vs. a logistic classifier. I went through a number of UCI datasets and showed how you could tell which classifier would win before running them.

Sadly, there is no video or textbook for the course, but I gave a talk that summarizes the main points from the class. I’m not aware of any textbook that covers the same ground.

**Attribution***Source : Link , Question Author : Heisenberg , Answer Author : Tom Minka*