Why is Functional Data Analysis (FDA) not as popular?

I am interested in FDA (data perceived as functions), as someone from a pure mathematics background and I think it can help provide solutions to some major challenges in data analysis (also data science), compared to the ordinary perspective in which data is perceived, as vectors. However, I have not found so much literature about it on the web. Could someone help me understand why this is so? Also, where can I get some good reads in FDA?

Answer

As someone with almost zero knowledge in FDA but who has recently started reading and thinking about it, here are some of my thoughts on why FDA is not so popular these days. Please take them with a grain of salt as I am far from being an expert :

  • Most of the Data Science problems people are interested in solving “do not care” about the nature of the data. What I mean is, when trying to do a task such as e.g. regression, classification, or clustering, statisticians will go for the methods that yield the best models for a minimal computational cost. Depending on context, what the definition of “best” is, the available information etc, one may choose different methods which themselves may rely on different possible representations of the data, such as vectors, matrices, graphs, smooth manifolds, functions… Therefore, when a dataset can be nicely represented as a tensor that you can plug into a CNN with solid guarantees on model performance, why would you bother picking the functional representation ?
    On the other hand, there are problems where you are trying to infer information on the functions you sample from themselves, in that specific case there is no way around using FDA (see here for some examples).
  • That brings me to my next point : Functional Data, in practice, is always stored as a (high-dimensional) finite-dimensional object, so the FDA paradigm is never really needed. Indeed, even if the data is in practice a set of realisations of a function depending on continuous parameter(s), which typically would be space and/or time, what you actually have stored in your computer is a discretised version of it. Although it is true that when the “mesh size” is sufficiently small, you are close to dealing with “real” functional data such as e.g. Random Fields, Stocks prices, Video recordings etc… It works fine in practice to consider that data as high (but finite) dimensional objects.
  • And here is, imo, the most crucial point : There are plenty of non-FDA specific algorithms that perform very well on functional data. There are indeed countless examples of successful processing of all kinds of functional data such as Video recordings, audio recordings, satellite imagery and many more. On the other hand, there hasn’t been, as far as I know, many (or any) big breakthroughs results justifying the superiority of FDA-specific methods for functional dataover more conventional ones outside of specific context. All hope is definitely not lost, as there definitely are a few theoretical arguments here and there suggesting that the FDA framework can be far superior to the finite-dimensional one in some cases, such as this paper by Delaigle and Hall : Achieving near perfect classification for functional data (2012) which shows that “perfect classification” is possible for functional data under very mild assumptions whereas it is definitely not the case in the finite-dimensional setting. In practice however, it seems that dimensionality reduction + classical methods works just fine.
  • Lastly, I think another factor is that the mathematical knowledge required to be able to contribute to research in FDA tends to be outside of the expertise of most statisticians. Indeed, the proposed algorithms in the literature often rely on rather deep results in Functional Analysis that many statisticians might not be too familiar with, and there are other issues such as, e.g., defining a meaningful measure on a space of functions which are more profound and less likely to interest statisticians who are generally more experts in stuff like Linear Algebra, Optimization, Concentration Inequalities, VC-Theory etc… I think it is kind of the same with Topological Data Analysis (although it seems to be gaining a lot of traction recently) : the ideas are very promising, but require a deep knowledge in pure maths concepts such as homology, Betti numbers or Euler characteristic… to be applied and further improved, which is a knowledge many statisticians do not have.

Even though I said all that, I do believe there is plenty of interesting things to do in that subfield and its full potential hasn’t been reached at all. I have mostly read papers related to the problems I am interested in, so I don’t have much to recommend, but I have read some of Hsing and Eubank and find it pretty great so far. I also found this review by Wang, Chiou and Müller to be pretty comprehensive to get a rough idea of the current state-of-the-art.

Attribution
Source : Link , Question Author : KaRJ XEN , Answer Author : StratosFair

Leave a Comment