# What is the origin of the name “conjugate prior”?

I know what a conjugate prior is. But I’m confused by the name itself. Why is it called “conjugate”? A complex conjugate $$z^\astz^\ast$$ has a reciprocal relationship with $$zz$$, i.e., $${z^\ast}^\ast = z{z^\ast}^\ast = z$$. But there isn’t such a reciprocal relationship between any two elements of the triad (prior, likelihood, posterior) or at least I’m not aware of it. So why “conjugate”? Is the term overloaded?

The Oxford English Dictionary defines “conjugate” as an adjective meaning “joined together, esp. in a pair, coupled; connected, related.” It’s not a huge stretch to imagine that a conjugate prior has a special and strong connection to its posterior.

It’s used in a similar sense in chemistry (conjugate acid/base; conjugate solution), botany (leaves that grow in pairs, especially when there’s only one pair), optics (conjugate foci), and linguistics (conjugations are forms of the same root word).

While some have a “reciprocal” implication, others don’t, so I don’t think it’s a necessary element of the meaning.

Wikipedia credits Raiffa and Schlaifer for coining the term (annoyingly, it’s not in the OED). Here’s the first mention of it in their 1961 book, which seems to be using the “joined” sense of conjugate.

We show that whenever (1) any possible experimental outcome can be
described by a sufficient statistic of fixed dimensionality (i.e., an
$$ss$$-tuple $$(y_1, y_2, \ldots y_s)(y_1, y_2, \ldots y_s)$$ where $$ss$$ does not depend on the
“size” of the experiment), and 2) the likelihood of every outcome is
given by a reasonably simple formula with $$y_1, y_2, \ldots y_sy_1, y_2, \ldots y_s$$ as
its arguments, we can obtain a very tractable family of “conjugate”
prior distributions simply by interchanging the roles of variables and
parameters in the algebraic expression for the sample likelihood, and
the posterior distribution will be a member of the same family as the
prior. “