I am doing a machine learning project using WEKA. It is a supervised classification and in my basic experiments, I achieved very poor level of accuracy. Then my intention was to do a feature selection, but then I heard about PCA.
In feature selection, what we do is we consider a subset of attributes which has the greatest impact towards our targeted classification.(If I am correct.)
In PCA, as far as I know, what we do is we generate a smaller amount of artificial set of attributes that will account for our target.(please correct me if I am wrong)
But I cannot understand what is the exact difference between these two. Which one is better? Does it depend on the particular study someone is doing?
And also, what about a combination of above two methodologies? (A PCA after a feature selection). Does it make any sense?
we consider a subset of attributes which has the greatest impact
towards our targeted classification.
This understanding is perfectly correct.
we generate a smaller amount of artificial set of attributes that will
account for our target.
This is partially correct. We are not accounting target in PCA. In layman terms, we do some assumption about the data and its distribution, and represent the data with higher dimension in much smaller dimension (say 3) which have most of the information content as original data. Thus, PCA is a transforming your attributes to artificial set with retaining most of information.
Which one is better? Does it depend on the particular study someone
Yes, it depends on the particular study. IF the assumption made in PCA transformation holds, then by doing PCA, you will have same information in small number of attributes. IF the assumption fails largely, Then doing PCA may ruin your classification.
Does it make any sense?
It perfectly makes sense.
By feature selection you reducing number of dimension by throwing out irrelevant information.
By PCA, you reducing number of dimension by transforming to artificial set but retaining same information.