Sequential Testing: April 6, 2026

The Problem with Peeking

What’s wrong with early stopping in hypothesis testing? R simulations show under-coverage. \(\tau\) (confusing since \(\tau\) is also ATE) is a stopping time.

\(X_\tau\) is random because \(X_n\) is random and \(\tau\) is random. The core issue:

\[P\!\left(X_\tau > z_{1-\alpha/2}\right) \neq P\!\left(X_n \geq z_{1-\alpha/2}\right)\]

What to do?

Power calculations \(\Rightarrow\) fix sample size beforehand
Just guess.

Simulation: Repeated \(z\)-Tests Under \(H_0\)

Setup. Under \(H_0\), draw \(X_i \overset{\text{iid}}{\sim} N(0,1)\) and compute a running \(z\)-test after each new observation. Stop and reject as soon as \(|Z_t| > z_{0.975} = 1.96\). Even though \(H_0\) is true, this “peek-and-stop” strategy rejects far more than 5% of the time.

set.seed(42)

n_sim  <- 10000
n_max  <- 200
alpha  <- 0.05
z_crit <- qnorm(1 - alpha / 2)

# Peek after every observation; stop if |Z_t| > 1.96
peek_and_stop <- function(n_max, z_crit) {
  x <- rnorm(n_max)
  for (t in 2:n_max) {
    z_t <- mean(x[1:t]) * sqrt(t)
    if (abs(z_t) > z_crit) return(TRUE)
  }
  return(FALSE)
}

# Correct fixed-n test
fixed_n_test <- function(n_max, z_crit) {
  x <- rnorm(n_max)
  abs(mean(x) * sqrt(n_max)) > z_crit
}

reject_peek  <- mean(replicate(n_sim, peek_and_stop(n_max, z_crit)))
reject_fixed <- mean(replicate(n_sim, fixed_n_test(n_max, z_crit)))

cat(sprintf("Nominal alpha          : %.3f\n", alpha))

Nominal alpha          : 0.050

cat(sprintf("Fixed-n rejection rate : %.3f\n", reject_fixed))

Fixed-n rejection rate : 0.048

cat(sprintf("Peek-and-stop rate     : %.3f\n", reject_peek))

Peek-and-stop rate     : 0.410

round_reject_peek <- round(reject_peek * 100)

The peek-and-stop strategy rejects \(H_0\) roughly 41% of the time despite \(\alpha = 0.05\) — a large inflation of Type I error. This gets worse as n_max grows: given infinite peeks you will eventually reject with probability 1 under the null.