Older Showing 405 to 408 Newer

Standard deviation

A Gaussian distribution is defined by the mean and SD, but the mean and SD exist regardless of the distribution of the data.

In the case of fat-tail distributions (meaning fatter tail than the exponential dist), or at least in the case of the Cauchy distribution (t-dist with df=1), the mean is not well-defined (What is the mean wage in a corporation? Depends a lot on whether there's a billionaire among them or not). Then the SD might not be well-defined either, since its calculation involves the mean. But how about the binomial distribution? It is not (necessarily) Gaussian, yet it definitely has a mean, and it has a SD. Now what does the SD tell us about the data?

It tells us less than it would, were the data Gaussian-distributed. We cannot say, for example, that 68% of the observations fall within one SD of the mean, since that property arises only from the definition of the Gaussian distribution. So what is the SD, with this property removed?

It is a measure of spread around the mean. It is not special if the data is not Gaussian. There are a lot of alternatives, like Mean Absolute Deviation, which are just as informative (or not), much like there are several ways of taking the "average" (mean, median, mode). In non-Gaussian data, it is less useful to know the SD, but it is not meaningless.

But suppose I am told simply that "the SD is 3". What have I been told?

Created 2022-Oct-02 (3 years ago)

Runs test

Time series analysis

The runs test simplifies a time series to two values (e.g. positive and negative, but you can pick any suitble y, not just 0) and basically checks how long the series stays on one of the two values – or how many times it changes.

+ + + + − − − + + + − − + + + + + + − − − −

When the residuals are white noise, you'd expect to see the sign changing rapidly. If you have long runs of one kind, there's something wrong.

Created 2022-Oct-02 (3 years ago)

Tobit regression

#statistics

The name Tobit is a pun on both Tobin, their creator, and their similarities to probit models.

The Tobit model is distinct from the truncated regression model, which is in general different and requires a different estimator.

According to Wikipedia, a Tobit model is simply any of a class of regression models in which the observed range of the dependent variable is censored in some way. Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.

I get the sense that this was the earliest censoring model. It is a multiple linear regression but with censored responses if it is above or below certain cutpoints.

Tobin's idea was to modify the likelihood function so that it reflects the unequal sampling probability for each observation depending on whether the latent dependent variable fell above or below the determined threshold.[5] For a sample that, as in Tobin's original case, was censored from below at zero, the sampling probability for each non-limit observation is simply height of the appropriate density function. For any limit observation, it is the cumulative distribution, i.e. the integral below zero of the appropriate density function. The tobit likelihood function thus is a mixture of densities and cumulative distribution functions.