When is a model good?
Parsimony
statmodeling.stat.columbia.edu/2009/05/07/bayes_jeffreys/
There is a body of work behind why model parsimony is good, but the reasoning doesn't apply everywhere. When does it not?
Components of a trustworthy data analysis
From simplystatistics.org/2018/06/04/trustworthy-data-analysis/
- How was the data gathered?
- How were the data processed?
- Sampling frame?
- Is there a reason why the variables might be causally related?
Rubin's basic questions
Donald Rubin has two questions he likes to ask any researcher:
- What would you do if you had all the data?
- What were you doing before you had any data?
Search engines
- millionshort.com/
- crawlcrawler.com/
- wiby.me/
- boardreader.com/
- search.marginalia.nu/
- metager.org/
- andisearch.com/
As an alternative, simply filtering results can remove lots of noise:
- addons.mozilla.org/en-US/firefox/addon/hohser/
- addons.mozilla.org/en-US/firefox/addon/ddg-hide-unwanted-results/
- greasyfork.org/en/scripts/1682-google-hit-hider-by-domain-search-filter-block-sites
What links here
- 2023-11-05
Frequentist "probability" means frequency
In the "classical" (frequentist) approach, the concept of probability is the limit (the stable value converged-on) of a long-run frequency of a thing relative to another thing. For an event A, one's uncertainty about its occurrence is calculated like in elementary school probability math, as the ratio of the number of times the event occurred to the number of trials.
If we roll a die many times, it will come up showing the number two approximately a sixth of the time, thus the probability of showing that number will be node:internal/modules/cjs/loader:1228
throw err;
^
Error: Cannot find module 'katex'
Require stack:
- /home/kept/private-dotfiles/.config/emacs/texToMathML.js
at Module._resolveFilename (node:internal/modules/cjs/loader:1225:15)
at Module._load (node:internal/modules/cjs/loader:1051:27)
at Module.require (node:internal/modules/cjs/loader:1311:19)
at require (node:internal/modules/helpers:179:18)
at Object.
Straightforward in the case of dice, but…
There are some concerns to this definition of probability. First, considering the probability of event A as a frequency means that we are only able to calculate it if we know the entire sample space Ω. Second, this definition is based on the concept of repeatability, which is not necessarily a characteristic of the event of interest: for instance the events "Caesar crossed the Rubicon" or "The next US president will be a woman" do not satisfy this assumption as they can only happen once. You see why, if Bayesian probability theory gives you the tools to quantify your guess about such events, it can be used to fuel decisions in your life where frequentist cannot.
I think it may be a good idea whenever you write papers and articles to use the term "probability" and notation "Pr[]" only when using the Bayesian definition thereof. Since objective probability does not exist to a Bayesian, it is confusing for a Bayesian to be posited to calculate Pr[Pr[a] > 0.10], a probability of a probability. Better to write Pr[Freq[a] > 0.10], keeping track of what we are talking about.
Technically, you can call limits of long-run frequencies a probability, since a Bayesian can produce the same number in special cases such as throwing dice, when he has no other information to go on and uses something called an uniform prior. Thus the notation Freq[a] is a renamed Pr[a] that meets specific conditions. Perhaps you could write that Freq[a] == Pr[a | uniform prior ∩ repeatable event ∩ trust in sources (like the provider of your dice) ∩ no knowledge ∩ whatever else], making it clear that Freq[a] is a shorthand if anything and that it is an abuse on the reader to merely say Pr[a] when you mean the aforementioned things. Though there may be no way to rigorously define Freq[a] as any edition of Pr[a], look up Lindley's paradox to be sure.