# Jaccard Similarity – From Data Mining book – Homework problem

Exercise 3.1.3 : Suppose we have a universal set U of n elements, and
we choose two subsets S and T at random, each with m of the n
elements.

What is the expected value of the Jaccard similarity of S
and T ?

Each item in T has an $\frac{m}{n}$ chance of also being in S. The expected number of items common to S & T is therefore $\frac{m^2}{n}$.
Exp. $\text{Jaccard Similarity} = \dfrac{\text{No. of common items}}{\text{Size of T} + \text{Size of S} – \text{Number of common items}} = \dfrac{m}{2n – m}$ (after simplification.)