Assess temporary effect of treatment

Imagine that I have a treatment that reduces the likelihood of response to a stimulus. This could be anything you like, but the simplest example is of a treatment (e.g., hand washing, mask wearing, etc.) that prevents disease when exposed.

For simplicity, I built an (overly complicated) model of the risk of responding to the stimulus over time under the two treatments (all of this is done in R using the tidyverse):

day_prob <-
  list(
    A = c(0.06, 0.06, pmax(dchisq(1:13, 3)*1, 0.06))
    , B = c(0.06, 0.06, pmax(dchisq(seq(1, 25, 2), 5)*6, 0.06))
  )

This can be plotted to show the risk:

day_prob %>%
  lapply(function(x){
    tibble(
      Day = 1:length(x)
      , Prob = x
    )
  }) %>%
  bind_rows(.id = "Group") %>%
  ggplot(aes(x = Day
             , y = Prob
             , col = Group)) +
  geom_line() +
  geom_point() +
  scale_color_brewer(palette = "Dark2")

Plot:

enter image description here

Note that there is a similar baseline risk for both group (0.06), some latency to develop after exposure (risk rises on day 3), and that risk falls over time back to baseline.

Now, assuming that I randomize individuals into the two treatments, what is my best approach to identify this effect? I can just analyze each day separately, though that raises some repeated-measures questions since (unlike the sample data below) it is likely that a positive individual is more (or even less) likely to test positive on subsequent days.

I’ve searched through a number of alternative questions, but nothing seems to quite capture this issue. I could try repeated-measures analyses, but the latency and return-to-baseline should both swamp my effect if they are sufficiently long. Further, it would be ideal to actually know when the effect is observed.

A sample data set:

make_obs <- function(group, day){
  sapply(1:length(group), function(idx){
    rbinom(1, 1, day_prob[[group[idx]]][day[idx]] )
  })
}

set.seed(12345)
example_data <-
  tibble(
    Group = rep(c("A", "B"), each = 50)
    , Ind = 1:100
    , Day = 1
  ) %>%
  complete(nesting(Group, Ind), Day = 1:15) %>%
  mutate(
    Obs = make_obs(Group, Day)
  )

Plotting to show the observed data:

example_data %>%
  group_by(Group, Day) %>%
  summarise(Prop_pos = mean(Obs)) %>%
  ungroup() %>%
  ggplot(aes(x = Day
             , y = Prop_pos
             , col = Group)) +
  geom_line() +
  scale_color_brewer(palette = "Dark2")

enter image description here

Answer

In biostatistics, time-dependent responses are often inspected in terms of the area under the curve. This is not to be confused to be the AUC of the ROC which is a classification metric. For instance, in an oral glucose challenge, blood is drawn hourly to see the glucose concentration over a period of time. Because diabetes and its complications are caused by cumulative concentration of blood sugar, the AUC of 2hOGTT blood sugar describes how effectively the body produces insulin.

enter image description here

https://www.researchgate.net/figure/Comparison-of-the-area-under-the-curve-AUC-values-of-blood-glucose-levels-obtained-from_fig1_273472980

Here the AUC reduces an overly complex question into a simple one: does the intervention reduce the time-averaged risk in one group versus another? In doing this, we are able to exonerate ourselves from making strong and possibly incorrect assumptions about the dependence structure. With imbalanced data, the curves can be estimated using splines over a reasonable range (restricted mean). For instance, if the curve in Group A does not “peak” as largely as that in Group B, but remains elevated over a long period of time, none of the modeling approaches proposed above will correctly estimate the standard deviation of the conditional response, and yet AUC simply addresses why Group B might be preferred (a less sustained response).

Attribution
Source : Link , Question Author : Mark Peterson , Answer Author : AdamO

Leave a Comment