🗒️ Ben's Notes

Concentration Inequalities

The goal of concentration inequalities is to provide bounds on the probability of a random variable taking values in its tail (regions farthest away from the mean).

This is especially useful when we don’t know the distribution of a random variable, or we have a combination of other random variables (sample mean, quicksort, multi-arm bandit…).

Markov’s Inequality #

If XX is a non-negative random variable with expectation μ\mu, then P(Xt)μtP(X \ge t) \le \frac{\mu}{t}

Chebyshev’s Inequality #

If we know the mean μ\mu and the variance σ2\sigma^2 of a random variable XX, then the probability of a result being tt standard deviations away from the mean can be expressed as P(Xμtσ)1t2P(|X - \mu| \ge t \sigma) \le \frac{1}{t^2} Alternatively, P(Xμc)σ2c2P(|X - \mu| \ge c) \le \frac{\sigma^2}{c^2}

Chernoff Bound #

P(Xt)=P(eλXeλt)E[eλx]eλtP(X \ge t) = P(e^{\lambda X} \ge e^{\lambda t}) \le \frac{E[e^{\lambda x}]}{e^{\lambda t}} Optimized form: P(Xc)mint>0MX(t)etcP(X \ge c) \le min_{t > 0} M_X(t) e^{-tc}

where MX(t)=E(etX)M_X(t) = E(e^{tX}) (moment generating function)

Hoeffding’s Inequality #

Hoeffding’s inquality is a special case of the Chernoff bound where the bounds of a random variable are known.

Hoeffding’s Lemma: if XX is a bounded random variable between aa and bb with mean μ\mu, then MX(t)exp((ba)28λ2+μλ)M_X(t) \le \exp(\frac{(b-a)^2}{8} \lambda^2 + \mu \lambda)

Hoeffding’s Inequality: If we have nn independent (may not be identically distributed) bounded random variables between aa and bb, P(1ni=1n(XiE(Xi))t)exp(2nt2(ab)2)P(\frac{1}{n} \sum_{i=1}^n (X_i - E(X_i)) \ge t) \le \exp(-\frac{2nt^2}{(a-b)^2})

Proof #

Let YY be 1ni=1n(XiE(Xi))\frac{1}{n} \sum_{i=1}^n (X_i - E(X_i)). Then, the MGF E[eλY]E[e^{\lambda Y}] = E[exp(λ/ni=1nXi)]E[\exp(\lambda/n \sum_{i=1}^n X_i)].

Using the properties of exponentials (exponential of sum is product of exponentials), E[eλY]=E[i=1nexp(λ/nXi)]E[e^{\lambda Y}] = E[\prod^n_{i=1} exp(\lambda/n X_i)].

Using independence rules, E[i=1nexp(λ/nXi)]=i=1nE[exp(λnXi)]E[\prod^n_{i=1} exp(\lambda/n X_i)] = \prod_{i=1}^n E[exp(\frac{\lambda}{n} X_i)]

Using Hoefding’s Lemma, the MGF must then be bounded by i=1nexp((ba)28λ2+μλ)\prod_{i=1}^n\exp(\frac{(b-a)^2}{8} \lambda^2 + \mu \lambda).