How to use SVD for dimensionality reduction to reduce the number of columns (features) of the data matrix? [duplicate]

My original data has many more columns (features) than rows (users). I am trying to reduce the features of my SVD (I need all of the rows). I found one method of doing so in a book called “Machine Learning in Action” but I don’t think it will work for the data I am using.

The method is as follows. Define SVD as A = USV^\top.

Set an optimization threshold (i.e., 90%). Calculate the total sum of the squares of the diagonal S matrix. Calculate how many S values it takes to reach 90% of the total sum of squares. So if that turns out to be 100 S values, then I would take the first 100 columns of the U matrix, first 100 rows of the V^\top matrix, and a 100\times 100 square matrix out of the S matrix. I would then calculate A = USV^\top using the reduced matrices.

However, this method does not target the columns of my original data, since the dimensions of the resulting A matrix are the same as before. How would I target the columns of my original matrix?

Answer

What @davidhigh wrote is correct: if you multiply reduced versions of \mathbf U_\mathrm{r}, \mathbf S_\mathrm{r}, and \mathbf V_\mathrm{r}, as you describe in your question, then you will obtain a matrix \tilde{ \mathbf A}=\mathbf U_\mathrm{r}\mathbf S_\mathrm{r}\mathbf V_\mathrm{r}^\top that has exactly the same dimensions as before, but has a reduced rank.

However, what @davidhigh did not add is that you can get what you want by multiplying reduced versions of \mathbf U_\mathrm{r} and \mathbf S_\mathrm{r} only, i.e. computing \mathbf B=\mathbf U_\mathrm{r}\mathbf S_\mathrm{r}. This matrix has (in your example) only 100 columns, but the same number of rows as \mathbf A. Matrix \mathbf V is used only to map the data from this reduced 100-dimensional space to your original p-dimensional space. If you don’t need to map it back, just leave \mathbf V out, and done you are.

By the way, the columns of matrix \mathbf B will contain what is called principal components of your data.

Attribution
Source : Link , Question Author : covfefe , Answer Author : amoeba

Leave a Comment