I know that this is the solving system of linear equation problem.
But my question is why it is a problem the number of observation is lower than the number of predictors how can that thing happen?
Does not the data collection come from the delicate survey design or experimental design to the extent that they at least think about this thing?
If the data collection want to collect 45 variables to conduct research then why would he collect less than 45 observation? Did I miss something and although the model selection part also eliminated the non-improvement variables on the response and always the collected variable will be eliminated to $45-(45-p)$ right?
So then why would we face the non-unique solution in those case?
This could occur in many scenarios, few examples are:
- Medical data analysis at hospitals. Medical researchers studying a particular cancer primarily can do data collection at their own hospital, and I think it is not a bad thing that they try collect many variables as possible from one particular patient like age, gender, tumour size, MRI, CT volume.
- Micro platereader array studies in bioinformatics. It is often the case that you don’t have many species but you want to be able to test for as many effects as possible.
- Analysis with images. You have often 16 million pixels while it is very difficult to collect and store that many images.
- MRI reconstructions are often similar problems, which need sparse regression techniques, and improving them is really a central question in MRI imaging research.
The solution is really, to look at the regression literature and find what best works for your application.
If you have domain knowledge, incorporate into your prior distribution and take a Bayesian approach with Bayesian Linear Regression.
If you want to find a sparse solution, automatic relevance determination’s empirical Bayes approach could be the way to go.
If you think that with your problem, having a notion of probabilities is inappropriate (like solving a linear systems of equations), it might be worth to look at the Moore-Penrose pseudoinverse.
You can approach it from a feature selection perspective, and reduce the number of p until it is a well-posed problem.