Jaccard Similarity – From Data Mining book – Homework problem

Exercise 3.1.3 : Suppose we have a universal set U of n elements, and
we choose two subsets S and T at random, each with m of the n
elements.

What is the expected value of the Jaccard similarity of S
and T ?

I am reading the book
http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

Answer

Each item in T has an $\frac{m}{n}$ chance of also being in S. The expected number of items common to S & T is therefore $\frac{m^2}{n}$.

Exp. $\text{Jaccard Similarity} = \dfrac{\text{No. of common items}}{\text{Size of T} + \text{Size of S} – \text{Number of common items}} = \dfrac{m}{2n – m}$ (after simplification.)

Attribution
Source : Link , Question Author : delete me , Answer Author : Soroush

Leave a Comment