Sampling

#statistics

The chain of subsets goes like this:

Sampling unit < Sample < Sampling frame < Population

Terms:

The target population
all subjects of interest (to which you want to be able to apply a conclusion) e.g. "all cows in the world".
The sampling frame
the subset of the population, from which you draw your sample, e.g. "all cows in Skane county".
Sampling design or sampling procedure
the whole plan, basically, may involve such concepts as stratified sampling

Population definition

Be specific!

  • All current customers -> Everyone who's bought from us in the last year.
  • SDSU students loyal to the football team -> SDSU students registered in Fall 2019 and have attended at least one game while a student
  • Unsatisfied customers -> Customers with an active paid account and have either scored us a 3 or lower in the last year or registered a complaint through e-mail

A good pop. def. uses clear, unambig, objective criteria to disting. betw. pop. and non-pop. members.

A good pop def suits the needs of the audience requesting the research

  • It is common that the client you're working with has a hazy idea of what makes someone a member of the pop of interest. Before you undertake your whole study, get verification from them so that all parties agree what the pop def is!

A good pop def matches with previous research so that the present study can be directly compared with them.

Sampling frame

Leslie Kish posited four basic problems of sampling frames:[7]

  1. "Errors of omission": Missing elements: Some members of the population are not included in the frame.
  2. "Errors of inclusion": Foreign elements: Non-members of the population are included in the frame.
  3. Duplicate entries: A member of the population is surveyed more than once.
  4. Groups or clusters: The frame lists clusters instead of individuals.

Example 1: errors of omission

Pop. of interest: all customers who have been on our tele-therapy platform in the last 30 days

Sampling frame: A list of 5,000 customer phone numbers, which were collected from customers after their 90-day free trial expired.

OK, there is going to be some, and probably sizable, overlap between these sets of individuals (many individuals are in both). However, some error sources:

  • Maybe the company didn't provide a complete customer list, just a list of 5,000. This kind of error of omission is okay – if the 5,000 were selected randomly, it won't affect conclusions.
  • People still in the trial period haven't had to submit phone numbers yet. This kind of error of omission poses a problem: the cause of the omission systematically differentiates your in-frame pop from your out-of-frame pop.

Potential remedies

  • If some pop characteristics are known, measure your sample and compare to see if they are approx equivalent, showing that the sample is representative.
    • E.g. if you KNEW from other research that the target pop is 65% 60+ years old, but your frame only has 5% 60+ years old, there's an issue!
Created (2 years ago)