What is principal subspace in probabilistic PCA?

if X is observed data matrix and Y is latent variable then

X=WY+μ+ϵ

Where μ is the mean of observed data, and ϵ is the Gaussian error/noise in data, and W is called principal subspace.

My question is when normal PCA is used we would get a set of orthonormal eigenvectors E for which following is true

Y=EX

But in PPCA, W is neither orthonormal nor eigenvectors. So how can I get principal components from W?

Following my instinct, I searched for ppca in MATLAB, where I came across this line:

At convergence, the columns of W spans the subspace, but they are not orthonormal. ppca obtains the orthonormal coefficients, coeff, for the components by orthogonalization of W.

I modified ppca code a little to get the W, ran it and after orthogonalization I did get P from W.

Why this orthogonalization gave eigenvectors, along which most of variance will be seen?

I am assuming, orthogonalization is giving me a set of orthogonal/orthonormal vectors which span principal subspace, but why this orthogonalized resultant matrix is equal to eigenmatrix(I know that eigenmatrix in pca also orthonormal)? Can I assume principal subspace is spanned only by a unique set of orthonormal vectors? In that case both result will coincide always.

Answer

This is an excellent question.

Probabilistic PCA (PPCA) is the following latent variable model
\begin{align}
\mathbf z &\sim \mathcal N(\mathbf 0, \mathbf I) \\
\mathbf x &\sim \mathcal N(\mathbf W \mathbf z + \boldsymbol \mu, \sigma^2 \mathbf I),
\end{align}

where \mathbf x\in\mathbb R^p is one observation and \mathbf z\in\mathbb R^q is a latent variable vector; usually q\ll p. Note that this differs from factor analysis in only one little detail: error covariance structure in PPCA is \sigma^2 \mathbf I and in FA it is an arbitrary diagonal matrix \boldsymbol \Psi.

Tipping & Bishop, 1999, Probabilistic Principal Component Analysis prove the following theorem: the maximum likelihood solution for PPCA can be obtained analytically and is given by (Eq. 7): \mathbf W_\mathrm{ML} = \mathbf U_q (\boldsymbol \Lambda_q – \sigma_\mathrm{ML}^2 \mathbf I)^{1/2} \mathbf R, where \mathbf U_q is a matrix of q leading principal directions (eigenvectors of the covariance matrix), \boldsymbol \Lambda_q is the diagonal matrix of corresponding eigenvalues, \sigma_\mathrm{ML}^2 is also given by an explicit formula, and \mathbf R is an arbitrary q\times q rotation matrix (corresponding to rotations in the latent space).

The ppca() function implements expectation-maximization algorithm to fit the model, but we know that it must converge to the \mathbf W_\mathrm{ML} as given above.

Your question is: how to get \mathbf U_q if you know \mathbf W_\mathrm{ML}.

The answer is that you can simply use singular value decomposition of \mathbf W_\mathrm{ML}. The formula above is already of the form orthogonal matrix times diagonal matrix times orthogonal matrix, so it gives the SVD, and as it is unique, you will get \mathbf U_q as left singular vectors of \mathbf W_\mathrm{ML}.

That is exactly what Matlab’s ppca() function is doing in line 305:

% Orthogonalize W to the standard PCA subspace
[coeff,~] = svd(W,'econ');

Can I assume principal subspace is spanned only by a unique set of orthonormal vectors?

No! There is an infinite number of orthogonal bases spanning the same principal subspace. If you apply some arbitrary orthogonalization process to \mathbf W_\mathrm{ML} you are not guaranteed to obtain \mathbf U_q. But if you use SVD or something equivalent, then it will work.

Attribution
Source : Link , Question Author : user3086871 , Answer Author : amoeba

Leave a Comment