# Visualizing longitudinal data with binary outcome

For longitudinal data with a numeric outcome, I can use spaghetti plots to visualize the data. For example something like this (taken from the UCLA Stats site):

``````tolerance<-read.table("http://www.ats.ucla.edu/stat/r/faq/tolpp.csv",sep=",", header=T)
interaction.plot(tolerance\$time, tolerance\$id, tolerance\$tolerance,
xlab="time", ylab="Tolerance",  legend=F)
`````` But what if my outcome is binary 0 or 1? For example, in the “ohio” data in R the binary “resp” Variable indicates the presence of a respiratory disease:

``````library(geepack)
ohio2 <- ohio[2049:2148,]
resp  id age smoke
2049    1 512  -2     1
2050    0 512  -1     1
2051    0 512   0     1
2052    0 512   1     1
2053    1 513  -2     1
2054    0 513  -1     1
2055    0 513   0     1
2056    1 513   1     1
2057    1 514  -2     1
2058    0 514  -1     1
2059    0 514   0     1
2060    1 514   1     1

interaction.plot(ohio2\$age+9, ohio2\$id, ohio2\$resp,
xlab="age", ylab="Wheeze status", legend=F)
`````` The spaghetti plot gives a nice figure, but is not very informative and does not tell me much. What would be a suitable way to visualize this kind of data? Maybe something that includes a probability-value on the y-axis?

There are quite a few ways to work around it.

Jittering the variables mildly to smear the lines apart

First, since both age and the outcome are nicely discrete, we can afford to mildly jitter them in order to show some trends. The trick is to use transparency in the line color so that it’s easier to discern the magnitude of overlapping.

``````library(geepack)
set.seed(6277)

ohio2 <- ohio[2049:2148,]

jitteredResp <- ohio2\$resp + rnorm(100,0,0.02)   # \$
jitteredAge  <- ohio2\$age+9 + rnorm(100,0,0.02)  # \$
age          <- ohio2\$age+9                      # \$
id           <- ohio2\$id                         # \$
wheeze       <- ohio2\$resp                       # \$

#### Variation 1 ####
plot(jitteredAge, jitteredResp, type="n", axes=F,
xlab="Age to the nearest year, jittered",
ylab="Wheeze status, jittered")
for (i in id){
par(new=T)
lines(age[id==i], jitteredResp[id==i], col="#FF000008", lwd=2)
}
axis(side=1, at=seq(7,10))
axis(side=2, at=c(0,1),  label=c("No", "Yes"))
`````` Getting fancy

It’s also possible to use this kind of curves to show the flow of the subjects. It’s just like a modification of the above chart, but using the width of the line to represent frequency rather using overlapping. Show the fate of each case

This may sound counter-intuitive, but if you lay the cases out in a systematic manner, it works just as fine to tell the aggregated story. Here the outcome of each case is shown along a grey color reference line. I didn’t add a legend there but using `legend` command it can be added quite easily. Blue is “resp = 0” and Red is “resp = 1”. Time (age) is spread out on the x-axis. Your data are conveniently presorted by outcome pattern, so I didn’t have to do anything. If they are not presorted, you’d have to use command like `dcast` in package `reshape2` to massage the data a bit.

``````#### Variation 2 ####
my.col             <- vector()
my.col[wheeze ==1] <- "#D7191C"
my.col[wheeze ==0] <- "#2C7BB6"

plot(age, id, type="n", frame=F, xlab="Age, year", ylab="", axes=F, xlim=c(7,10))
abline(h=id, col="#CCCCCC")
axis(side=1, at=seq(7,10))
mtext(side=2, line=1, "Individual cases")
points(age, id, col=my.col, pch=16)
`````` Tabulation

Visualization is not the only way out. Since there would only be, at most, 16 different patterns, you can also tabulate them. Use `+` and `-` to create patterns like `+ + + +` and `+ - - -`, and then for each of these patterns, attach the counts and percentage. This can show the information equally effectively.