Truncated data

Truncation deals with missing data from part of a distribution:

Censoring is when the data is not missing, but the true value below a certain level is unknown so all observations take the same value when below that level (e.g. everything below -2 takes on the value -2), which practically increases the count of that one value, looking odd in the following histogram.

The difference between censoring and truncation, then, is that in censoring we still have access to whole observations, being blind only to the dependent variable, whereas in truncation those observations are dropped altogether so we don't even have access to the independent variables.

Tobit?

Sometimes you also hear the term Tobit regression. How does it differ? A sentence in the Stata blog post caught my eye:

For truncated linear regression, we can use the truncreg command, and for censored linear regression, we can use the intreg or tobit command.

Bam. Tobit is just a subset of censored methods. It doesn't say that on Wikipedia or almost anywhere.

Created 2019-Jan-12 (6 years ago)