There is one variable in my data have 80% of missing data. The data is missing because of non-existence (i.e. how much bank loan the company owes). I came across an article saying that dummy variable adjustment method is the solution for this problem. Meaning that I need to transform this continuous variable to categorical?

Is this the only solution? I do not want to drop this variable as I think theoretically, it is important to my research question.

**Answer**

Are the data “missing” in the sense of being *unknown* or does it just mean there is no loan (so the loan amount is zero)? It sounds like the latter, in which case you need an *additional* binary dummy to indicate whether there is a loan. No transformation of the loan amount is needed (apart, perhaps, from a continuous re-expression, such as a root or started log, which might be indicated by virtue of other considerations).

This works well in a regression. A simple example is a conceptual model of the form

dependent variable (Y) = loan amount (X) + constant.

With the addition of a loan indicator (I), the regression model is

Y=βII+βXX+β0+ϵ

with ϵ representing random errors with zero expectations. The coefficients are interpreted as:

β0 is the expectation of Y for no-loan situations, because those are characterized by X=0 and I=0.

βX is the marginal change in Y with respect to the amount of the loan (X).

βI+β0 is the intercept for the cases with loans.

**Attribution***Source : Link , Question Author : lcl23 , Answer Author : whuber*