Use of information theory in applied data science

Today I ran across the book “Information theory: A tutorial introduction” by James Stone and thought for a moment or two about the extent of use of information theory in applied data science (if you’re not comfortable with this still somewhat fuzzy term, think data analysis, which IMHO data science is a glorified version of). I’m well aware of the significant use of information theory-based approaches, methods and measures, especially entropy, under the hood of various statistical techniques and data analysis methods.

However, I’m curious about the extent/level of knowledge that is needed for an applied social scientist to successfully select and apply those concepts, measures and tools without diving too deep into mathematical origins of the theory. I look forward to your answers, which might address my concern within the context of the above-mentioned book (or other similar books – feel free to recommend) or in general.

I would also appreciate some recommendations for print or online sources that discuss information theory and its concepts, approaches, methods and measures in the context of (in comparison with) other (more) traditional statistical approaches (frequentist and Bayesian).


So the first part of question: Do data scientists need to know information theory? I thought the answer is no until very recently. The reason I changed my mind is one crucial component: noise.

Many machine learning models (both stochastic or not) use noise as part of their encoding and transformation process and in many of these models, you need to infer the probability which the noise affected after decoding the transformed output of the model. I think that this is a core part of information theory. Not only that, in deep learning, KL divergence is a very important measure used that also comes from Information Theory.

Second part of the question: I think the best source is David MacKay’s Information Theory, Inference and Learning Algorithms. He starts with Information Theory and takes those ideas into both inference and even neural networks. The Pdf is free on Dave’s website and the lectures are online which are great

Source : Link , Question Author : Aleksandr Blekh , Answer Author : Nick Cox

Leave a Comment