Making sense of independent component analysis

I have seen and enjoyed the question Making sense of principal component analysis, and now I have the same question for independent component analysis. I mean I want to make a comprehensive question about the intuitive ways to understand ICA?

I want to understand it. I want to get the purpose of it. I want to get the feel of it. I strongly believe that:

You do not really understand something unless you can explain it to your grandmother.
— Albert Einstein

Well, I can’t explain this concept to a layman or grandma

  1. Why ICA? What was the need for this concept?
  2. How would you explain this to a layman?


Here’s my attempt.


Consider the following two cases.

  1. You are a private eye at a party. Suddenly, you see one of your old clients talking to someone, and you can hear some of the words but not quite, because you also hear someone else who’s next to him, participating in an unrelated discussion about sports. You don’t want to come closer – he’ll spot you.
    You decide to take your partner’s phone (who’s busy convincing the bartender non-alcoholic beer is great) and plant it about 10 meters next to you. The phone is recording, and the phone also records the old client’s talk as well as the interfering sports guy. You take your own phone and start recording as well, from where you’re standing. After about 15 minutes you go home with two recordings: one from your position, and the other from about 10 meters away. Both recordings contain your old client and Mr. Sporty, but on each recording, one of the speakers is of a slightly different volume relative to the other (and this relative volume is kept constant during the entire talk for each recording, because fortunately no one of the participants moved around the room).
  2. You take a picture of a cute Labrador Retriever dog you see outside the window. You check-out the image, and unfortunately you see a reflection from the window that’s between you and the dog. You can’t open the window (it’s one of those, yes) and you can’t go outside because you’re afraid he’ll run away. So you take (for some unclear reason) another image, from a slightly different position. You still see the reflection and the dog, but they are in different positions now, since you’re taking the picture from a different place. Also note that the position changed uniformly for each pixel in the image, because the window is flat and not concave/convex.

The question is, in both cases, how to restore the conversation (in 1.) or the image of the dog (in 2.), given the two images that contain the same two “sources” but with slightly different relative contributions from each. Surely my educated grandchild can make sense of this!

Intuitive solution

How can we, at least in principle, get back the image of the dog from a mixture? Each pixel contains values that are a sum of two values! Well, if each pixel was given without any other pixels, our intuition would be correct – we would not have been able to guess the exact relative contributions of each of the pixels.

However, we are given a set of pixels (or points in time in the case of the recording), that we know hold the same relations. For example, if on the first image, the dog is always twice stronger than the reflection, and on the second image, it is just the opposite, then we might be able to get the correct contributions after all. And then, we can come up with the correct way to subtract the two images at hand so that the reflection is exactly cancelled! [Mathematically, this means finding the inverse mixture matrix.]

Diving into details

Let’s say you have a mixture of two signals, Y1=a11S1+a12S2Y2=a21S1+a22S2

and let’s say you would like to obtain back S1 as a function of the two mixtures, Y1,Y2. And let’s also assume that you want a linear combination: S1=b11Y1+b12Y2. So, all you need to do is to find the best vector (b11,b12) and there you have it. Similarly for S2 and (b21,b22).

But how can you find it for general signals? they may look similar, have similar statistics, etc. So let’s assume they’re independent. That’s reasonable if you have an interfering signal, such as noise, or if the two signals are images, the interfering signal may be a reflection of something else (and you took two images from different angles).

Now, we know that Y1 and Y2 are dependent. Since we may not recover S1,S2 exactly, denote our estimation for these signals as X1,X2, respectively.

How can we make X1,X2 be as close as possible to S1,S2? Since we know the latter are independent, we might want to make X1,X2 as independent as possible, by jiggling with the values of bij. After all, if the matrix {aij} is invertible, we can find some matrix {bij} that inverts the mixing operation (and if it’s not invertible, we can get close), and if we make them independent, good chance we restore our Si signals.

If you are convinced we need to find such {bij} that makes X1,X2 independent, we now need to ask how to do that.

So first consider this: if we sum up several independent, non-Gaussian signals, we make the sum “more Gaussian” than the components. Why? due to the central limit theorem, and you can also think about the density of the sum of two indep. variables, which is the convolution of the densities. If we sum several indep. Bernoulli variables, the empirical distribution will resemble more and more a Gaussian shape. Will it be a true Gaussian? probably not (no pun intended), but we can measure a Gaussianity of a signal by the amount it resembles a Gaussian distribution. For instance, we can measure its excess kurtosis. If it’s really high, it is probably less Gaussian than one with the same variance but with excess kurtosis close to zero.

Therefore, if we were to find the mixing weights, we might try to find {bij} by formulating an optimization problem that at each iteration, makes the vector of X1,X2 slightly less Gaussian. Mind that it may not be truly Gaussian at any stage, but we just want to reduce the Gaussianity. Hopefully, finally, and if we don’t get stuck at local minima, we would get the backwards mixing matrix {bij} and get our indep. signals back.

Of course, this adds another assumption – the two signals need to be non-Gaussian to begin with.

Source : Link , Question Author : Sepideh Abadpour , Answer Author : Community

Leave a Comment