# How to plot binary (presence/absence – 1/0) data against continuous variables [duplicate]

I am trying to plot presence/absence (1/0) of a sample species against various environmental variables.

I have put presence/absence on the y-axis and the environmental variable (in this case barometric pressure) on the x axis, however the resulting plot looks terrible.

Is there a better way to do this? I thought of plotting presence/absence against the frequency of the environmental variable, would this be possible?

If I understood the question correctly – you might want to use a “conditional density plot”.

Such a plot provides a smoothed overview of how a categorical variable changes across various levels of continuous numerical variable.

Example

For a real-world example here is the distribution of Sepal Width across 3 different species in the iris dataset:

``````cdplot(Species ~ Sepal.Width, data=iris)
``````

Interpretation

These plots represent smoothed proportions of each category within various levels of the continuous variable. In order to interpret them you should look across at the x-axis and see how the different proportions for each category (represented by different colors) change with the different values of the numerical variable.

For example consider the picture above: it is quite easy to see that when sepal width reaches 3.5 or above you are most likely dealing with setosa type of flower. At sepal width 2.0 the versicolor dominates. And at 3.0 there are about 20% setosa, 35% versicolor and 45% virginica (judging by eye according to the scales on the y-axis on the right.)

For another discussion about interpretation of such plots consider reading answers in this question: Interpretation of conditional density plots

Of course in your case you would have 2 categories on the y-axis. So the final picture would look closer to this example:

``````set.seed(14)

presence <- factor(rbinom(20, 1, 0.5))
presence
[1] 0 1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 1 1 1
Levels: 0 1

pressure <- runif(20, 1000, 1035)
pressure
[1] 1012.282 1014.687 1021.619 1024.159 1026.247 1021.663 1013.469
1018.317 1024.054 1002.747 1028.396 1004.806 1033.906 1022.898
1033.127 1004.378 1019.386 1016.432 1030.160 1021.567

cdplot(presence ~ pressure)
``````

Interpretation stays the same, except you will be dealing with a binary categorical variable. In this particular case the plot would suggest that the presence (1, light grey area) is increasing with increasing values of pressure (x-axis).