I have a servlet-based application wherein I measure the time taken to complete each request to that servlet. I already compute simple statistics like the mean and maximum; I’d like to produce some more sophisticated analysis however, and to do so I believe I need to properly model these response times.
Surely, I say, response times follow some well-known distribution, and there are good reasons to believe that distribution is the right model. However, I don’t know what this distribution ought to be.
Log-normal and Gamma come to mind, and you can make either one sort of fit real response time data. Does anyone have a view on what distribution the response times ought to follow?
The Log-Normal distribution is the one I find best at describing latencies of server response times across all the user base over a period of time.
You may see some examples at the aptly-named site lognormal.com whose in the business of measuring site latency distribution over time and more. I have no affiliation with the site except for being a happy user. Here’s how the distribution looks like; response (e.g web page load) time vs number of responses:
Note that in this chart, the load-time (X-axis) scale is linear. If you switch the x-axis to a log-scale, the shape of the distribution would look more normal (bell-shaped) on the right side of the peak.