Quoting from a Wikipedia article on parameter estimation for a naive Bayes classifier: “a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution.”
I understand that a Gaussian distribution is convenient for analytical reasons. However, is there any other realworld reason to make this supposition? What if the population consists of two subpopulations (smart/dumb people, large/small apples)?
Answer
At least for me, the assumption of normality arises from two (very powerfull) reasons:

The Central Limit Theorem.

The Gaussian distribution is a maximum entropy (with respect to the continuous version of Shannon’s entropy) distribution.
I think you are aware of the first point: if your sample is the sum of many procceses, then as long as some mild conditions are satisfied, the distribution is pretty much gaussian (there are generalizations of the CLT where you in fact don’t have to assume that the r.v.s of the sum are identically distributed, see, e.g., the Lyapunov CLT).
The second point is one that for some people (specially physicists) make more sense: given the first and second moments of a distribution, the distribution which less information assumes (i.e. the most conservative) with respect to the continuous Shannon’s entropy measure (which is somewhat arbitrary on the continuous case, but, at least for me, totally objective in the discrete case, but that’s other story), is the gaussian distribution. This is a form of the so called “maximum entropy principle”, which is not so widespread because the actual usage of the form of the entropy is somewhat arbitrary (see this Wikipedia article for more information about this measure).
Of course, this last statement is true also for the multivariate case, i.e., the maximum entropy distribution (again, with respect to the continuous version of Shannon’s entropy) given first ($\vec{\mu}$) and second order information (i.e., the covariance matrix $\mathbf{\Sigma}$), can be shown to be a multivariate gaussian.
PD: I must add to the maximum entropy principle that, according to this paper, if you happen to known the range of variation of your variable, you have to make adjustments to the distribution you get by the maximum entropy principle.
Attribution
Source : Link , Question Author : lmsasu , Answer Author : Néstor