I think it's true to say that most of us, when we are young, dream of growing up to become statisticians. Sadly, very few of us chase that dream, and instead we grow up to become, say, premiership footballers or Grand Prix drivers. Nevertheless, there are a select few who persist, and who consequently spend their working lives looking at random variables, probability density functions, and cumulative distribution functions. Still, my beating heart.
Needless to say, statistics doesn't really do it for me. The problem with statistics, as a branch of mathematics, is that its value derives largely from its practical utility, not the intrinsic interest of its mathematical structures. Whilst statistics has vital application in all branches of science, finance and commerce, it is precisely this universality and utility which divest it of intrinsic interest. Statistics performs a service for other sciences, rather than being of value in its own right.
Nevertheless, any subject can be interesting once you tunnel sufficiently deeply, so consider the following well-known statistical scientific truism:
If a number of independent random variables combine in an additive fashion, then the collective result is a normal distribution. (i.e., a bell-shaped, Gaussian distribution).
If a number of independent random variables combine in a multiplicative fashion, then the collective result is a lognormal distribution. (i.e., a distribution whose logarithm is a normal distribution).
Now, I'm not sure whether this common wisdom is really correct. As far as I can make out, the first clause in a simplified statement of the central limit theorem. This asserts that the sum of a collection of n independent and identically distributed random variables will converge to a normal distribution as n tends to infinity. The central limit theorem has one particularly important implication for measurement science and the estimation of measurement error:
Suppose that the variable to be measured has an arbitrary distribution, and suppose that one measures the value of the variable by taking a collection of sample measurements, each sample consisting of n measurements; if one calculates the mean value from each sample, then the distribution of sample means will converge to a normal distribution, centred upon the true value of the measured variable, as the size of the sample, n, tends to infinity. Hence, whatever the distribution of the variable being measured, whether it is normal or not, the collection of sample means will have a normal distribution. This is crucial, because it enables one to estimate the 95% or 99% confidence interval, (the uncertainty or measurement error), in a measurement estimate, using the simple formulae or tables of values associated with the normal distribution.
So far, so good. But in general, will a sum of independent random variables give a normal distribution? The central limit theorem doesn't entail that it will, for the central limit theorem requires the contributing random variables to be identically distributed. So what happens when a sum of independent random variables with different distributions is taken?
The second assertion, that a lognormal distribution results from random variables combining in a multiplicative fashion, also needs to be tightly qualified. The assertion is largely based upon the following property of the logarithmic function:
log (A x B) = log A + log B
Thus, given a collection of independent random variables with identical distributions, their product will possess a distribution well-approximated by the logarithm of a normal distribution (applying the central limit theorem again). However, what if the collection of variables are not identically distributed?
Moreover, it has been noted that if the number of steps in a multiplicative process is itself subject to a statistical distribution, then the result will not necessarily be a lognormal distribution. For example, if the number of steps in a multiplicative process is subject to a geometric distribution (a discrete version of the exponential distribution), then whilst the body of the distribution will be lognormal, the tails will exhibit power law behaviour. That's interesting.