The article The Odds, Continually Updated mentions the story of a Long Island fisherman who literally owes his life to Bayesian Statistics. Here’s the short version:
There are two fishermen on a boat in the middle of the night. While one is
asleep, the other falls into the ocean. The boat continues to troll
along on autopilot all through the night until the first guy finally
wakes up and notifies the Coast Guard. The Coast Guard uses a piece
of software called SAROPS (Search and Rescue Optimal Planning
System) to find him just in time, as he was hypothermic and just about
out of energy to stay afloat.
Here’s the long version: A Speck In The Sea
I wanted to know more about how Bayes’ Theorem is actually applied here. I found out quite a bit about the SAROPS software just by googling.
The SAROPS simulator
The simulator component takes into account timely data such as ocean current, wind, etc. and simulates thousands of possible drift paths. From those drift paths, a probability distribution map is created.
Note that the following graphics do not refer to the case of the missing fisherman I mentioned above, but are a toy example taken from this presentation
Probability Map 1 (Red indicates the highest probability; blue the lowest)
Note the circle that is the starting location.
Probability Map 2 – More time has passed
Note that the probability map has become multimodal. That is because in this example, multiple scenarios are accounted for:
- The person is floating in the water – top-middle mode
- The person is in a life raft (more affected by the wind out of the North) – bottom 2 modes (split because of “jibing effects”)
Probability Map 3 – Search has been conducted along the rectangular paths in red
This image shows the optimal paths produced by the planner (another component of SAROPS). As you can see, those paths were searched and the probability map has been updated by the simulator.
You might be wondering why the areas that have been searched have not been reduced to a zero probability. That’s because there’s a probability of failure, p(fail), factored in, that is there’s a non-negligible chance that the searcher will overlook the person in the water. Understandably, the probability of failure is much higher for a lone person afloat than for a person in a life raft (easier to see), which is why the probabilities in the top area did not go down very much.
Effects of an unsuccessful search
This is where Bayes’ Theorem comes in to play. Once a search is conducted, the probability map gets updated accordingly so another search can be planned optimally.
After reviewing Bayes’ Theorem on wikipedia and in the article An Intuitive (and Short) Explanation of Bayes’ Theorem on BetterExplained.com
I took the Bayes’ equation:
And defined A and X as follows…
Event A: The person is in this area (grid cell)
Test X: Unsuccessful search over that area (grid cell) i.e. Searched that area and didn’t see anything
P(person there∣unsuccessful)=P(unsuccessful∣person there)×P(person there)P(unsuccessful)
I found in Search and Rescue Optimal Planning System that SAROPS calculates the probability of a failed search, P(fail), by taking into account the search paths and simulated drift paths. So for simplicity let’s assume that we know what the value of P(fail) is.
So now we have,
P(person there∣unsuccessful)=P(fail)×P(person there)P(unsuccessful)
Is the Bayes’ equation applied correctly here?
How would the denominator, the probability of an unsuccessful search, be calculated?
Also in Search and Rescue Optimal Planning System, they say
The prior probabilities are “normalized in the usual Bayesian fashion” to produce the posterior probabilities
What does “normalized in the normal Bayesian fashion” mean?
Does it mean all probabilities are divided by P(unsuccessful), or just simply normalized to ensure the entire probability map adds up to one? Or, are these one and the same?
Lastly, what would be the correct way to normalize the gridded probability map after you’ve updated for an unsuccessful search, considering that since you haven’t searched ALL of the areas (grid cells) you’d have some cells equal to P(person there) and some equal to P(person there∣unsuccessful)?
Yet another simplification note – according to Search and Rescue Optimal Planning System the posterior distribution is actually calculated by updating the probabilities of the simulated drift paths, and THEN re-generating the gridded probability map. In order to keep this example simple enough, I chose to ignore the sim paths and focus on the grid cells.
- Assuming independence between the grid cells, then yes it appears Bayes’ Theorem has been properly applied.
- The denominator can be expanded, e.g.
using the law total of probability where Ac is the complement of A, i.e. the person is not there. Likely you would assume P(X|Ac)=1.
- I’m not really sure what “normalized in the normal Bayesian fashion means” since I didn’t write the manual. But they are certainly talking about the fact that the following three equations are sufficient to find P(A|X): P(A|X)∝P(X|A)P(A)P(Ac|X),∝P(X|Ac)P(Ac), and P(A|X)+P(Ac|X)=1 So you never have to calculate P(X), i.e. the normalizing constant. Whether they used this to update the probability for a single grid cell or for the entire map, I don’t know (probably both).
Let’s expand the notation to have grid cell i and Ai be the event the individual is in grid cell i and Xi be the event that grid cell i was searched and nobody was found. With the new notation, X is going to be the collection of searches that failed. We assume the following:
- ∑iP(Ai|X)=1, i.e. after performing searches, the sum overall cells of the probability that the individual is in that cell is 1. This is the total law of probability again.
- If we assume searching in one cell does not tell us anything about any other cell, then for cells that were searched P(Ai|X)=P(Ai|Xi)∝P(Xi|Ai)P(Ai) and for cells that were not searched P(Ai|X)∝P(Ai). If we don’t assume independence, the formulas will be more complicated but the intuition will be similar, i.e. calculating P(Ai|X) up to a proportionality constant.
We can use these two assumptions to calculate P(Ai|X) and update the map accordingly.