I am not a mathematician. I have searched the internet about KL Divergence. What I learned is the the KL divergence measures the information lost when we approximate distribution of a model with respect to the input distribution. I have seen these between any two continuous or discrete distributions. Can we do it between continuous and discrete or vice versa?
No: KL divergence is only defined on distributions over a common space. It asks about the probability density of a point x under two different distributions, p(x) and q(x). If p is a distribution on R3 and q a distribution on Z, then q(x) doesn’t make sense for points p∈R3 and p(z) doesn’t make sense for points z∈Z. In fact, we can’t even do it for two continuous distributions over different-dimensional spaces (or discrete, or any case where the underlying probability spaces don’t match).
If you have a particular case in mind, it may be possible to come up with some similar-spirited measure of dissimilarity between distributions. For example, it might make sense to encode a continuous distribution under a code for a discrete one (obviously with lost information), e.g. by rounding to the nearest point in the discrete case.