Determine an unknown number of real world locations from GPS-based reports

I’m working on some software which should determine real world locations (f.e. speed cams) from several GPS-based reports. An user will be driving when reporting a location, thus the reports a very inaccurate. To solve that problem I have to cluster reports about the same location and calculate an average.

My question is about how to cluster those reports. I read about Expectation-maximation algorithms and k-means clustering, but as I understood I would need to determine the number of real locations in advance.

Are there any other algorithms, which don’t need the exact number of real locations, but instead use some edge conditions (f.e. minimal distance) ?

A report contains longitude, latitude and accuracy (in meters). There is no name or anything else which could be used to identify duplicates.

Another obstacle could be that it will be common, that there is only one report for a real world location. That makes it difficult to distinguish outliers from good data.


I have found a software that maybe can help you. It looks like somebody had the same problem that you and they gave him a solution in this forum, so you will need to use ArcGIS, but if you are looking for an algorithm they suggest this paper. I think the paper is detailed enough to be a good start fro your algorithm.

Source : Link , Question Author : Christian Strempfer , Answer Author : eyanquenb

Leave a Comment