I am trying to get a global perspective on some of the essential ideas in machine learning, and I was wondering if there is a comprehensive treatment of the different notions of loss (squared, log, hinge, proxy, etc.). I was thinking something along the lines of a more comprehensive, formal presentation of John Langford’s excellent post on Loss Function Semantics.
The Tutorial on Energy-Based Learning by LeCun et al. might get you a good part of the way there. They describe a number of loss functions and discuss what makes them “good or bad” for energy based models.