Suppose I have an algorithm that classifies things into two categories. I can measure the accuracy of the algorithm on say 1000 test things — suppose 80% of the things are classified correctly.

Lets suppose I modify the algorithm somehow so that 81% of things are classified correctly.

Can statistics tell me anything about whether my improvement to the algorithm is statistically significant? Is the concept of statistical significance relevent in this situation? Please point me in the direction of some resources that might be relevant.

Many Thanks.

**Answer**

In short, yes. Statistical significance is relevant here. You are looking at the classification error (or, as you give it here accuracy = 1- classification error). If you compare the classificators on different 1000 samples you can simply use the binomial test, if it is the same 1000 samples you need to use McNemar’s test. Note that simply testing the classification error in this way is suboptimal because you either assume the classification error is independent of the true class or that the proportion of the true classes is the same across your potential applications.

This means you should take a look at measures like true positive rate, false positive rate or AUC. What measure to use and how to test it, depends on the output of your classicator. It might just be a class or it might be a continous number giving the probability of belonging to a certain class.

**Attribution***Source : Link , Question Author : Ben , Answer Author : Erik*