I think it may be a problem if we directly use dummy variable for a categorical predictor having hundreds of levels.

I have found one solution from the book ‘Elements of Statistical Learning’ (p.329). The solution is mentioned in classification tree sections. Specifically, the solution orders the levels of the categorical predictor by the number of occurrence of each level in one class, and then treats the predictor as an ordered predictors.

I wonder for models other than classification tree, such as linear regression, what would be proper ways of handling categorical predictors with too many levels.

I found an old post asking similar questions, but no answers have been posted:

**Answer**

I can’t see that ordering the levels by frequency creates an ordinal variable.

Shrinkage is necessary to deal with this problem, either by using penalized maximum likelihood estimation (e.g., R `rms`

package’s `ols`

and `lrm`

functions for quadratic (ridge) L2 penalty) or using random effects. You can get predictions for individual levels easily using penalized maximum likelihood estimation, or by using BLUPS in the mixed effects modeling context.

**Attribution***Source : Link , Question Author : Jerry , Answer Author : Frank Harrell*