I’m doing a cross validation using the leave-one-out method. I have a binary response and am using the boot package for R, and the cv.glm function. My problem is that I don’t fully understand the “cost” part in this function. From what I can understand this is the function that decides whether an estimated value should be classified as a 1 or a 0, i.e the threshold value for the classification. Is this correct?
And, in the help in R they use this function for a binomial model:
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
. How do I interpret this function? so I can modify it correctly for my analysis.Any help is appreciated, don’t want to use a function I don’t understand.
Answer
r is a vector that contains the actual outcome, pi is a vector that contains the fitted values.
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
This is saying $cost = \sum|r_i – pi_i|$. You can define your own cost functions. In your case for binary classification you can do something like this
mycost <- function(r, pi){
weight1 = 1 #cost for getting 1 wrong
weight0 = 1 #cost for getting 0 wrong
c1 = (r==1)&(pi<0.5) #logical vector - true if actual 1 but predict 0
c0 = (r==0)&(pi>=0.5) #logical vector - true if actual 0 but predict 1
return(mean(weight1*c1+weight0*c0))
}
and put mycost as an argument in the cv.glm function.
Attribution
Source : Link , Question Author : mael , Answer Author : Feng Mai