I’m looking for the appropriate theoretical framework or speciality to help me deal with understanding how to deal with the errors that the GPS system has – especially when dealing with routes.
Fundamentally, I’m looking for the requirements on the data and any algorithms to use to be able to establish the length of a trail. The answer needs to be trustworthy.
A friend of mine was the race director of a race which was billed as 160km but the Garmin watches everybody has makes it more like 190km+. It caused quite some grief at the finish line, let me tell you!
So my friend went back to the course with various GPS devices in order to remap it and the results are interesting.
Using a handheld Garmin Oregon 300 she got 33.7km for one leg. For the same leg on a wrist watch Garmin Forerunner 310xt it came out to 38.3km.
When I got the data from the Oregon it was obvious that it was only recording data every 90 seconds or so. The Forerunner does it every couple of seconds.
When I plotted the data from the Oregon I could see that it got confused by some switchbacks and put a line straight through them and a curve was made a little less.
However, I muse that the difference in the recording frequency is much of the explanation. i.e. by recording every couple of seconds the Forerunner is closer to the real route. However, there will be an amount of error because of the way GPS works. If the points recorded are spread around the real route randomly (because of the error) then the total distance will be larger than the real route. (A wiggle line going either side of a straight line is longer than the straight line).
So, my questions: 1. Are there any techniques I can use on a single dataset to reduce the error in a valid way? 2. Does my theory about the difference in recording frequency hold water? 3. If I have multiple recordings of the same routes are there any valid techniques to combine them to get closer to the real route?
As I say, I don’t really know what to search for to find any useful science about this. I’m looking for ways to establish just how long a given piece of trail is and it is very important to people. An extra 30km in a race is an extra 5+ hours we weren’t expecting.
As requested here is some sample data:
Thanks for any advice you can give.
This is a well studied problem in geospatial science–you can find discussions of it on GIS forums.
First, note that the wiggles do not necessarily increase the route’s length, because many of them actually cut inside curves. (I have evaluated this by having an entire classroom of students digitize the same path and then I compared the paths.) There really is a lot of cancellation. Also, we can expect that readings taken just a few seconds apart will have strongly, positively, correlated errors. Thus the measured path should wiggle only gradually around the true path. Even large departures don’t affect the length much. For example, if you deviate by (say) 5 meters laterally in the middle of a 100 m straight stretch, your estimate of the length only goes up to 2√502+52=100.5, a 0.5% error.
It is difficult to compare two arbitrary paths in an objective way. One of the better methods, I believe, is a form of bootstrapping: subsample (or otherwise generalize) the most detailed path you have. Plot its length as a function of the amount of subsampling. If you express the subsampling as a typical vertex-to-vertex distance, you can extrapolate a fit to a zero distance, which can provide an excellent estimate of the path length.
With multiple recordings, you can create a 2D kernel smooth of each, sum the smooths, and subject that to a topographic analysis to look for a “ridgeline.” You won’t get a single connected line, usually, but you can often patch the ridges together into a continuous path. People have used this method to average hurricane tracks, for example.