Why a sufficient statistic contains all the information needed to compute any estimate of the parameter?

I’ve just started studying statistics and I can’t get an intuitive understanding of sufficiency. To be more precise I can’t understand how to show that the following two paragraphs are equivalent:

Roughly, given a set X of independent identically distributed data conditioned on an unknown parameter θ, a sufficient statistic is a function T(X) whose value contains all the information needed to compute any estimate of the parameter.

A statistic T(X) is sufficient for underlying parameter θ precisely if the conditional probability distribution of the data X, given the statistic T(X), does not depend on the parameter θ.

(I’ve taken the quotes from Sufficient statistic)

Though I understand the second statement, and I can use the factorization theorem to show if a given statistic is sufficient, I can’t understand why a statistic with such a property has also the property that it “contains all the information needed to compute any estimate of the parameter”. I am not looking for a formal proof, which would help anyway to refine my understanding, I’d like to get an intuitive explanation of why the two statements are equivalent.

To recap, my questions are: why the two statements are equivalent? Could someone provide an intuitive explanation for their equivalence?