Standard deviation

#statistics

A Gaussian distribution is defined by the mean and SD, but the mean and SD exist regardless of the distribution of the data.

In the case of fat-tail distributions (meaning fatter tail than the exponential dist), or at least in the case of the Cauchy distribution (t-dist with df=1), the mean is not well-defined (What is the mean wage in a corporation? Depends a lot on whether there's a billionaire among them or not). Then the SD might not be well-defined either, since its calculation involves the mean. But how about the binomial distribution? It is not (necessarily) Gaussian, yet it definitely has a mean, and it has a SD. Now what does the SD tell us about the data?

It tells us less than it would, were the data Gaussian-distributed. We cannot say, for example, that 68% of the observations fall within one SD of the mean, since that property arises only from the definition of the Gaussian distribution. So what is the SD, with this property removed?

It is a measure of spread around the mean. It is not special if the data is not Gaussian. There are a lot of alternatives, like Mean Absolute Deviation, which are just as informative (or not), much like there are several ways of taking the "average" (mean, median, mode). In non-Gaussian data, it is less useful to know the SD, but it is not meaningless.

But suppose I am told simply that "the SD is 3". What have I been told?

Created 2022-Oct-02 (2 years ago)