[1] 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 0 1
[1] "T_20: 2.2361"
May 3, 2026
Twice in the last week, I’ve had computer scientists laugh about the insanity of asymptotics in Statistics1. The insanity comes from the assumption that is never true: \(n = \infty\). As someone working on asymptotic approximations in Statistics, I feel the need to defend its usefulness by showing how people have justified it in the past, but also think through ways to improve upon the justification for using normal models.
In Statistics, we often assume \(n = \infty\) and this makes people feel uncomfortable. Outside of statistics, engineers linearize systems assuming that \(\Delta = 0\) in the Taylor expansion. If you have some complex function \(f(x)\), let’s just analyze the second order expansion. PID controllers, for instance, are second order controllers and are incredibly popular.
The assumptions are essentially the same. Assuming \(\Delta = 0\) is the same as assuming \(n = \infty\). The proof I learned of the CLT (Tao 2015) is the Lindeberg swapping trick. You take two derivatives (hence the need for two finite moments) and write all higher order terms as \(o(1)\). In some regard the Gaussian is the first order random variable approximation.
There is interesting psychology happening that brings about complete terror when you assume \(n = \infty\) but does not bring it out when assuming \(\Delta = 0\). I suspect humans often see numbers near \(0\) in daily life, but will never see a number near infinity. Moreover, we can imagine an infinitely small amount of time, but can never imagine an infinitely large amount of time.
If \(n \neq \infty\) then how can we ever use a normal approximation? In other words, what is the approximation error? The answer to this question is the very classical result in probability called the Berry-Essen bound.
Recall the statement of the CLT: Let \(X_1, \dots \sim \mathsf P\) be i.i.d with variance \(\sigma^2 < \infty\) and zero mean. Define \(S_n = \sum_{i=1}^{n} X_i\) to be the partial sum. Then \[ \lim_{n \to \infty} \left|\mathsf P\left(\frac{S_n}{\sigma\sqrt{n}} \leq z_{\alpha}\right) - \alpha \right| = 0, \] where \(z_\alpha\) is the \(\alpha\) quantile of a standard normal. That is \(\mathsf P(Z < z_\alpha) = \alpha\) where \(Z \sim \mathcal N(0,1)\). The theorem says if you look at large enough \(n\) your scaled partial sums are essentially Gaussian.
The obvious question is what is “large enough”. In my first STAT class at Duke, I had a z-score table and a t-table. I was told that if \(n > 30\) then we can use a z-test.
To motivate the approximation error, assume you \(20\) Bernoulli datapoints: \(X_1, \dots, X_{20}\). You want to perform a hypothesis test asking if \(p = 0.5\) or if \(p > 0.5\). You have not learned about Neyman-Pearson and the likelihood ratio test, so you instead compute a p-value using the Gaussian approximation. That is you write \[ T_{20} = \frac{\sum_{i=1}^{20} (X_i - 0.5)}{\sqrt{20 \times 0.25}} \overset{H_0}{\approx } \mathcal N(0, 1). \] You observe \(15\) ones and \(5\) zeroes, so you can calculate \(T_{20}\) to be
[1] 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 0 1
[1] "T_20: 2.2361"
and you can find the p-value by looking at the tail of the normal distribution:
[1] "p-val: 0.013"
Great! The p-value less than 0.05 so we can reject!2 However, we remember that this is an approximate p-value that becomes exact when \(n = \infty\). But \(n = 20\), which I would claim is much smaller than \(\infty\). Should we be worried? What is the approximation error?
What we are asking is how close is \(T_{20}\) to a standard normal random variable. The Berry-Esseen bound (Berry 1941; Esseen 1942) gives us an answer. It states that if you assume a finite third moment \(\mathbb{E}(|X_1 - \mathbb E(X_1)|^3) = \rho < \infty\), then you can get a uniform error band. \[ \sup_{x \in \mathbb R}\left | \mathsf P\left(\frac{S_n}{\sigma\sqrt{n}} \leq x\right) - \mathsf P\left(Z \leq x\right) \right | \leq \frac{C \rho}{\sigma^3\sqrt{n}}, \] where \(C\) is some universal constant around \(0.5\). Basically, the theorem says that if you take the scaled partial sums \(\frac{S_n}{\sigma\sqrt{n}}\) then you expect them to be Gaussian if \(n\) is large. How Gaussian? The distance from a true Gaussian \(Z\) decreases at a \(\frac{1}{\sqrt n}\) rate and depends on the variance and third moment.
Return to the Binomial Example
We found that the p-value is \(0.013\) and define \(z^\star\) to be the normal quantile at this p-value. We want to know how big the following quantity is:
\[ \left | \mathsf P\left (\frac{S_n}{\sigma\sqrt{n}} > z^\star \right) - \mathsf P\left (Z > z^\star \right) \right |. \]
We’re reporting the number on the right side of the absolute value but what we really care about is the number on the left side of the absolute value. Recall the statement of the Berry-Essen bound. It says for any quantile value \(x \in \mathbb R\) the distance from the true probability and the approximate one is upper bounded at a \(1/\sqrt{n}\) rate.
The statement is true for any \(x \in \mathbb R\), so it ought to be true for \(z^\star\) since \(z^\star\) is in \(\mathbb R\). You can calculate that \(\rho/\sigma^3\) is 1 since this ratio captures the asymmetry of random variables. Taking \(C \approx 0.47\) gives the maximum error is 0.105.
Therefore \[ \left | \mathsf P\left (\frac{S_n}{\sigma\sqrt{n}} > z^\star \right) - \mathsf P\left (Z > z^\star \right) \right | \leq 0.1. \] and your p-value \(0.01\) which was before less than 0.05 could now be at most \(0.11\), which is certainly greater than 0.05!
The usual Berry-Essen bound gives a uniform deviance from the true partial sums with a Gaussian random variable. We used the following logic: The Berry-Essen bound is true for any \(x \in \mathbb R\) so it ought to be true for \(z^\star\). This jump loses a lot of information and hence the bound becomes quite a bit more loose.
The intuition is that the tails of a distribution function are a lot less variable than the middle of a distribution function 3. I’ve wondered about this gap of the Berry-Esseen bound this year and finally took time to search the literature on what are called Non-Uniform Berry-Esseen Bounds (Nagaev 1965). They are strictly stronger than the uniform counterparts because they imply the uniform counterparts.
The non-uniform bound assumes the exact same set up but now we have \[ \left | \mathsf P\left(\frac{S_n}{\sigma\sqrt{n}} \leq x\right) - \mathsf P\left(Z \leq x\right) \right | \leq \frac{C' \rho}{\sigma^3\sqrt{n}}\left(\frac{1}{1 + |x|^3}\right). \] This formulation has a correction factor tied to the original rate. Notice that if \(x\) is very extreme then the fraction is \(< 1\) and thus shrinks the distance. Moreover, this statement implies the uniform one since taking \(\sup x\) on both sides reduces the correction factor to just one.
The tight constant for the uniform bound is a bit less than \(0.5\). And for the non-uniform bound? \(30\)! See Pinelis (2013), Section 3.1, for an entertaining historical account of this constant, which includes some algebraic mistakes along the way.
If we were to apply the non-uniform bound in our Bernoulli example, we get a distance of \(0.4\) whereas the uniform bound gives us a max distance of \(0.1\). We’re in a worse situation using the non-uniform bound because of the constant. Intuitively we’re ok using CLT-based hypothesis testing because it is approximately correct. The uniform Berry-Essen is too conservative and the non-uniform version has too large of a constant that one would expect could be tightened but the tools aren’t there.
I remark that computer scientists are apparently very against normal approximations and enjoy their finite sample correct concentration inequalities. This tracks with my earlier post on asymptotics.↩︎
Unfortunately I’ve asked friends outside of statistics what a p-value is and they’ve responded that it needs to be less than 0.05.↩︎
This is because quantiles are binomial random variables and more extreme \(p\) has smaller variance.↩︎