Performing PCA with only a distance matrix

I want to cluster a massive dataset for which I have only the pairwise distances. I implemented a k-medoids algorithm, but it’s taking too long to run so I would like to start by reducing the dimension of my problem by applying PCA. However, the only way I know to perform this method is using the covariance matrix which I don’t have in my situation.

Is there a way to apply PCA knowing the pairwise distances only?


Update: I entirely removed my original answer, because it was based on a confusion between Euclidean distances and scalar products. This is a new version of my answer. Apologies.

If by pairwise distances you mean Euclidean distances, then yes, there is a way to perform PCA and to find principal components. I describe the algorithm in my answer to the following question: What’s the difference between principal components analysis and multidimensional scaling?

Very briefly, the matrix of Euclidean distances can be converted into a centered Gram matrix, which can be directly used to perform PCA via eigendecomposition. This procedure is known as [classical] multidimensional scaling (MDS).

If your pairwise distances are not Euclidean, then you cannot perform PCA, but still can perform MDS, which is not going to be equivalent to PCA anymore. However, in this situation MDS is likely to be even better for your purposes.

Source : Link , Question Author : bigTree , Answer Author : Community

Leave a Comment