# Central Limit Theorem Explained Jarno Elonen <elonen@iki.fi>, 2002-12-01 | URN:NBN:fi-fe20031150

(The interactive demo requires a browser that support Java and Javascript - clicking the links should immediately show the examples in the applet. It is possible to understand text even without the demo, though. The total area of probability density function does not have to be 1 when using the applet.)

When you throw a die ten times, you rarely get ones only. The usual result is approximately same amount of all numbers between one and six. Of course, sometimes you may get a five sixes, for example, but certainly not often.

If you sum the results of these ten throws, what you get is likely to be closer to 30-40 than the maximum, 60 (all sixes) or on the other hand, the minumum, 10 (all ones).

The reason for this is that you can get the middle values in many more different ways than the extremes. Example: when throwing two dice: 1+6 = 2+5 = 3+4 = 7, but only 1+1 = 2 and only 6+6 = 12.

That is: even though you get any of the six numbers equally likely when throwing one die, the extremes are less probable than middle values in sums of several dice. Try:

The same applies to throwing coins. When you drop a handfull of coins on the floor, it is extremely rare that all of them end up facing the same way. Again, it may happen sometimes, but the usual result is that there are about as many heads as there are tails. Let us agree that heads=0, tails=1 and try a few simulations:

Small variations from the average are of course more common than large ones. Thus, as you probably noticed if you tried the simulations above, the probabilities of different sums follow roughly the normal distribution or "Gauss bell curve" whose center is halfway between the smalles and largest possible sums. This halfway is called the expected value.

Expected value (i.e. center of normal distribution) is halfway between the minimum and maximum values only if all the cases are equally likely; such as in throwing dice and coins (where the probablities of all the cases are 1/6 and 1/2, respectively). For example, when throwing a dart on a dart board, it is easier to get one of the low scores than the bull's eye. Note, however, that even though all the cases were not equally likely, the extreme sums are still less common than the ones near the middle. The sum of several throws therefore still approximately follows the normal distribution, only this time the center (expected value) is closer to the low end than the high end of different sums:

So, the distribution of the cases may as well be symmetric or biased; the sum of several events still follow the normal distribution and only the middle (expected value) and "steepness" (variance) vary. In fact, when you sum many enough events, the probability density function doesn't matter at all as long as the amount of different sums is finite and you don't get the one and same number all the time. Examples:

Sine wave distribution is far from symmetric:
One sine wave 10000 times | Two sine waves 10000 times | Five sine waves 10000 times

The tangent distribution is not particularily symmetric either, and is, in fact even discrete, "spiky":
One tangent 10000 times | Three tangents 10000 times | Twelve tangents 10000 times

...and as a final touch:
completely random distribution | sum of five completely random distributions 30000 times

Because most naturally occuring measurable phenomena (such as the height of humans) are depended on more than one condition (in the human height case: nutrition, several genes, environment, personal history, ...), this all leads to an extremely important fact: almost all measurable "random" variables in real world follow some kind of normal distribution.

The formal representation of the central limit theorem looks like this: ,
when X1, X2,... are independent observations
of random variable X, to which applies:  This work is dedicated to the Public Domain.
You may also distribute and modify the source code of the Applet freely (see details in the code).
Applet's source code: highlighted version | non-formatted version