I would like to do a significance test to check if the performance difference for a few multi-class classifiers is due to chance. These classifiers are currently measured by F1 and uses the same data. I have a training set which i split into a train (80%) and validation (20%) set. I also have a very small set of test data from an unseen corpus which i have to use to perform the final test.

I don’t have much stats background, so I would like to verify the following.

Should i perform the significance test on the validation set or the small test set?

I believe the answer should be the small test set since this is ultimately to be the one to be used for practical purposes. Though i’m not sure if 150 samples is good enough for this.I read about McNemar’s test [http://www.john-uebersax.com/stat/mcnemar.htm] but i understand its mainly used for Binary classifiers. This can be generalised for KxK box with Stuart-Maxwell test.

Thus far, this is what i learned after some reading. I would like some pointers if its ok.

Say 2 algorithms, with 3 classes to classify. So when i build the 3×3 table for McNemars, i will set it up this way.

Out of N test samples, a represent the number of times when both Algo1 and Algo2 predicted class A, b represent the number of times when Algo1 predicted class B but Algo2 predicted class A…and so on.

Let column vector d contain any K-1 of the values.

Let S denote the (K – 1) × (K – 1) matrix of the variances and covariances of the elements of d. The elements of S are equal to:

sAA = A. + .A – 2*(a)

sBB = B. + .B – 2*(e)

sCC = C. + .C – 2*(i)

sAB = -(b + d)

sBC = -(f + h)

sAC = -(c + g)

The Stuart-Maxwell statistic is calculated as:

X2 = d’ S−1 d.

where d’ is the transpose of d and matrix S−1 is the inverse of S.

X2 is interpreted as a chi-squared value with degree of freedom equal to K-1=3.

Finally, with the p-value, i can determine if algo1 and algo2 are statistically different.+---------------------+--------------+--------------+--------------+-----------+ | | Algo1 ClassA | Algo1 ClassB | Algo1 ClassC | Row Total | +---------------------+--------------+--------------+--------------+-----------+ | Algo2 ClassA | a | b | c | A.=a+b+c | | Algo2 ClassB | d | e | f | B.=d+e+f | | Algo2 ClassC | g | h | i | C.=g+h+i | | Col Total | .A=a+d+g | .B=b+e+h | .C=c+f+i | | | vector d=(dA,dB,dC) | dA=A.-.A | dB=B.-.B | dC=C.-.C | | +---------------------+--------------+--------------+--------------+-----------+Thank you.

**Answer**

**Attribution***Source : Link , Question Author : Jax , Answer Author : Community*