Truncated data

#statistics

Truncation deals with missing data from part of a distribution:

A histogram that appears to take a Gaussian form except a part is cut off, with a Gaussian curve superimposed that ends up not matching

A histogram that appears Gaussian except a part is cut off and the missing bars are collected into one tall bar

Censoring is when the data is not missing, but the true value below a certain level is unknown so all observations take the same value when below that level (e.g. everything below -2 takes on the value -2), which practically increases the count of that one value, looking odd in the following histogram.

invertable.smoke1a 2019-01-12 16-17-46

The difference between censoring and truncation, then, is that in censoring we still have access to whole observations, being blind only to the dependent variable, whereas in truncation those observations are dropped altogether so we don't even have access to the independent variables.

Tobit?

Sometimes you also hear the term Tobit regression. How does it differ? A sentence in the Stata blog post caught my eye:

For truncated linear regression, we can use the truncreg command, and for censored linear regression, we can use the intreg or tobit command.

Bam. Tobit is just a subset of censored methods. It doesn't say that on Wikipedia or almost anywhere.

Created (5 years ago)