I have a question about the “weights” and “prior” in R’s rpart function.

This question has been asked before here, but the answer doesn’t quite make sense.Currently I have very unbalanced data where the target is only 0.0066% of the whole dataset, which has over 2 million rows. I want to know if either the “weights” or the “prior” can help me with this biased dataset, and how they would be used.

I tried oversampling the target and downsampling the noise and then producing an ensemble of my predictions, but I did not achieved the desired result.

**Answer**

I see two questions here.

### 1) What is the difference between `weights`

and `parms`

in `rpart`

?

If you look at the code, `weights`

argument is passed to the `model.frame`

object, so it should be applied towards each observation of your dataset, just like in `lm`

.

```
if (is.data.frame(model)) {
m <- model ## <---- m is defined here
model <- FALSE
}
else {
indx <- match(c("formula", "data", "weights", "subset"),
names(Call), nomatch = 0L)
if (indx[1] == 0L)
stop("a 'formula' argument is required")
temp <- Call[c(1L, indx)]
temp$na.action <- na.action
temp[[1L]] <- quote(stats::model.frame) ## <---- passed to model.frame
m <- eval.parent(temp)
}
Terms <- attr(m, "terms")
if (any(attr(Terms, "order") > 1L))
stop("Trees cannot handle interaction terms")
Y <- model.response(m)
wt <- model.weights(m) ## <---- used as observation weights
```

On the other hand, `parms`

is for the class weights, which deals with unbalanced class size. I believe this is what you are looking for.

### 2) How to use the `parms`

argument?

If you look at the description of `parms`

:

For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), …

Hence, you want to store your prior probability vector in a list with name “prior”. **The order of probability should be exactly the same as the output of levels(data$y)**, where

`y`

indicates your response variable. For example, you might want to try something like the following:```
fit <- rpart(y ~ x1 + x2 + x3, data = data, parms = list(prior = c(0.000066, 1 - 0.000066)))
```

