I came across a large body of literature which advocates using Fisher's Information metric as a natural local metric in the space of probability distributions and then integrating over it to define distances and volumes.
But are these "integrated" quantities actually useful for anything? I found no theoretical justifications and very few practical applications. One is Guy Lebanon's work where he uses "Fisher's distance" to classify documents and another one is Rodriguez' ABC of Model Selection… where "Fisher's volume" is used for model selection. Apparently, using "information volume" gives "orders of magnitude" improvement over AIC and BIC for model selection, but I haven't seen any follow up on that work.
A theoretical justification might be to have a generalization bound which uses this measure of distance or volume and is better than bounds derived from MDL or asymptotic arguments, or a method relying on one of these quantities that's provably better in some reasonably practical situation, are there any results of this kind?
There was a read paper last week at the Royal Statistical Society on MCMC techniques over Riemann manifolds, primarily using the Fisher information metric: http://www.rss.org.uk/main.asp?page=1836#Oct_13_2010_Meeting
The results seem promising, though as the authors point out, in many models of interest (such as mixture models) the Fisher information has no analytic form.
The reason that there is "no follow up" is that very few people understand the work of Rodriguez on this going back many years. It's important stuff and we will see more of it in the future I am sure.
However, some would argue that the Fisher metric is only a 2nd order approximation to the true metric (e.g. Neumann's paper on establishing entropic priors ) which is actually defined by the Kullback-Liebler distance (or generalisations thereof) and which leads to Zellner's formulation of MDI priors.
The most well know argument is that the fisher metric, being invariant to coordinate transforms, can be used to formulate an uninformed prior (Jeffreys prior). Not sure I buy it!
Less well known, is that sometimes these "integrated quantities" turn out to be divergences and such, one may argue that the fisher distances generate a generalised set of divergences (and properties thereof).
But still, I'm yet to find a good intuitive description of the fisher information and the quantities it generates. Please tell me if you find one.