An instructor at my university posed a question like this (not for homework since the class is over and I wasn’t in it). I can’t figure out how to approach it.

The question concerns 2 bags each containing an assortment of different kinds of fruits:

The first bag contains the following randomly selected fruit:

+-------------+--------+---------+ | diameter cm | mass g | rotten? | +-------------+--------+---------+ | 17.28 | 139.08 | 0 | | 6.57 | 91.48 | 1 | | 7.12 | 74.23 | 1 | | 16.52 | 129.8 | 0 | | 14.58 | 169.22 | 0 | | 6.99 | 123.43 | 0 | | 6.63 | 104.93 | 1 | | 6.75 | 103.27 | 1 | | 15.38 | 169.01 | 1 | | 7.45 | 83.29 | 1 | | 13.06 | 157.57 | 0 | | 6.61 | 117.72 | 0 | | 7.19 | 128.63 | 0 | +-------------+--------+---------+The second bag contains 6 randomly selected fruit from the same store as the first bag. The sum of their diameters is 64.2 cm and 4 are rotten.

Give an estimate for the mass of the second bag.I can see that there appear to be two different kinds of fruit with normally distributed diameters and masses but I am lost on how to proceed.

**Answer**

Let’s begin by plotting the data and take a look at it. This is a very limited amount of data, so this is going to be somewhat *ad hoc* with plenty of assumptions.

```
rotten <- c(0,1,1,0,0,0,1,1,1,1,0,0,0)
rotten <- as.factor(rotten)
mass <- c(139.08,
91.48,
74.23,
129.8,
169.22,
123.43,
104.93,
103.27,
169.01,
83.29,
157.57,
117.72,
128.63)
diam <- c(17.28,
6.57,
7.12,
16.52,
14.58,
6.99,
6.63,
6.75,
15.38,
7.45,
13.06,
6.61,
7.19)
plot(mass,diam,col=rotten,lwd=2)
title("Fruits")
```

So this is the data, red dots represent rotten fruits:

You are correct in assuming that there seem to be two kinds of fruits. The assumptions I make are the following:

- The diameter splits the fruits into two groups
- Fruits with a diameter greater than 10 are in one group, others in the smaller group.
- There is only one rotten fruit in the big fruit group. Let’s assume that the if a fruit is in the large group, then being rotten does not affect the weight. This is essential, since we only have one data point in that group.
- If the fruit is a small fruit, then being rotten affects the mass.
- Let’s assume that the variables diam and mass are normally distributed.

Because it is given that the sum of the diameter is 64.2 cm, then it is most likely that two fruits are large and four are small. Now there are 3 cases for the weight. There is 2, 3 or 4 small fruits rotten, (*a large fruit being rotten does not affect the mass by assumption*). So now you can get bounds on your mass by calculating these values.

We can empirically estimate the probability for number of small fruits being rotten. We use the probabilities to weight our estimates of the mass, depending on the number of rotten fruits:

```
samps <- 100000
stored_vals <- matrix(0,samps,2)
for(i in 1:samps){
numF <- 0 # Number of small rotten
numR <- 0 # Total number of rotten
# Pick 4 small fruits
for(j in 1:4){
if(runif(1) < (5/8)){ # Empirical proportion of small rotten
numF <- numF + 1
numR <- numR + 1
}
}
# Pick 2 large fruits
for(j in 1:2){
if(runif(1) < 1/5){# Empirical proportion of large rotten
numR <- numR + 1
}
}
stored_vals[i,] <- c(numF,numR)
}
# Pick out samples that had 4 rotten
fourRotten <- stored_vals[stored_vals[,2] == 4,1]
hist(fourRotten)
table(fourRotten)
# Proportions
props <- table(fourRotten)/length(fourRotten)
massBig <- mean(mass[diam>10])
massSmRot <- mean(mass[diam<10 & rotten == 1])
massSmOk <- mean(mass[diam<10 & rotten == 0])
weights <- 2*massBig + c(2*massSmOk+2*massSmRot,1*massSmOk+3*massSmRot,4*massSmRot)
Est_Mass <- sum(props*weights)
```

Giving us a final estimate of **691.5183g**. I think you have to make most of the assumptions I have made to reach a conclusion, but I think it might be possible to do this in a smarter way. Also I sample empirically to get the probability of number of rotten small fruits, that is just laziness and can be done “analytically”.

**Attribution***Source : Link , Question Author : rutilusk , Answer Author : Gumeo*