$
\newcommand{\ud}{\mathrm{d} }
\newcommand{\calX}{\mathcal{X}}
\newcommand{\ve}{\varepsilon}
\newcommand{\N}{\mathcal{N}}
\newcommand{\bX}{\mathbf{X}}
\newcommand{\bA}{\mathbf{A}}
\newcommand{\bb}{\mathbf{b}}
\newcommand{\bY}{\mathbf{Y}}
\newcommand{\by}{\mathbf{y}}
\newcommand{\bI}{\mathbf{I}}
\newcommand{\bbeta}{\pmb{\beta}}
\newcommand{\bve}{\pmb{\varepsilon}}
\newcommand{\bzero}{\pmb{0}}
\newcommand{\ds}{\displaystyle}
\newcommand{\tL}{\tilde{L}}
\newcommand{\pt}{\partial}
\newcommand{\E}{\mathbb{E}}
\newcommand{\P}{\mathbb{P}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Cov}{\mathrm{Cov}}
\newcommand{\Var}{\mathrm{Var}}
\newcommand{\raw}{\rightarrow}
\newcommand{\gt}{>}
\newcommand{\lt}{<}
\newcommand{\sm}{\setminus}
$
1. Introduction
The Law of Large Numbers (LLN) focuses on the tendency of the average of a sequence of random variables to approach their expected value.
Consider a sequence of random variables, denoted as $X_1$, $X_2$, $\cdots$ with $\E(X_i) = \mu$.
The Law of Large Numbers asserts, under certain assumptions, that the expression
\[\frac{X_1+X_2+\cdots+X_n}{n}\tag{1}\]
converges to $\mu$ as $n$ approaches infinity.
LLN is termed the weak law if (1) converges in probability and termed the strong law if (1) converges almost surely. For simplicity, from now we denote
\[S_n = X_1+X_2+\cdots+X_n.\]
We say that random variables $\{X_i\}$ converge to $X$ in probability if for all $\ve>0$, we have
\[\lim_{n\raw\infty}\P(|X_n-X|\geq\ve)=0.\]
And $\{X_i\}$ is said to converge to $X$ almost surely (a.s.) to $X$ if
\[\lim_{n\raw\infty}X_n(\omega)=X(\omega)\]
on $\Omega\sm A$ where $\P(A) = 0$.
If $\{X_i\}$ converges to $X$ $a.e.$, it also converges to $X$ in probability.
Suppose $\{X_i\}\raw X\;a.s.$, then the set $A$ where $X_n$ does not converge to $X$ is a null set,
i.e. for any fixed $\omega\in A$, for all $\ve\gt0$, $|X_n(\omega)-X(\omega)|\geq \ve$ holds for infinitely many $n$,
let $E_n$ be the event $\{\omega:|X_n(\omega)-X(\omega)|\geq \ve\}$, thus (see details here)
\[0=\P(E_n\;\text{i.o.}) = \P(\limsup\limits_{n\raw\infty}E_n)=\lim_{n\raw\infty}\P(\bigcup_{k\geq n}E_k).\]
Since $\ds E_n\subseteq\bigcup_{k\geq n}E_k$, we have $\P(E_n)\leq \P(\bigcup_{k\geq n}E_k)$,
therefore
\[\lim_{n\raw\infty}\P(E_n)\leq\lim_{n\raw\infty}\P(\bigcup_{k\geq n}E_k) =0,\]
which means $X_n\raw X$ in probability.
\[\tag*{$\blacksquare$}\]
From this lemma, we can see that if $\{X_i\}$ follows the strong law, then it also follows the weak law. Also from the proof,
we establish an equivalent definition for almost sure convergence, i.e. for all $\ve\gt 0$,
\[\P(|X_n-X|\geq \ve\;\text{i.o.})=0.\tag{2}\]
2. Chebyshev weak law
Weak LLN is easily established under a strong assumption:
$\{X_i\}$ are uncorrelated and their variance $\Var(X_i)$ has a common bound. $\{X_i\}$ is said to be uncorrelated if $\E(X_i^2)\lt\infty$ and
\[\E(X_iX_j)=\E(X_i)\E(X_j),\quad \forall i\neq j.\]
The condition $\E(X_i^2)\lt\infty$ is necessary in the above definition as it guarantees the finiteness of $\E(X_iX_j)$ and also $\E(X_i)$ by Cauchy-Schwarz ineqaulity
\[[\E(X_iX_j)]^2\leq \E(X_i^2)\E(X_j^2).\]
Let $\{X_i\}$ be uncorrelated, then
\[\Var(S_n)=\sum^n_{i=1}\Var(X_i).\]
First, we will prove the case when $\mu = 0$, we have
\[\begin{split}\Var(S_n) &= \E[S_n^2] = \E[(\sum^n_{i=1}X_i)^2] \\
&= \E[\sum^n_{i=1}X_i^2+2\sum_{1\leq i\lt j \leq n}X_iX_j] \\
&= \sum^n_{i=1}\E(X_i^2)+2\sum_{1\leq i\lt j \leq n}\E(X_iX_j)\\
&= \sum^n_{i=1}\E(X_i^2)+2\sum_{1\leq i\lt j \leq n}\E(X_i)\E(X_j)\\
&=\sum^n_{i=1}\E(X_i^2)\\
&=\sum^n_{i=1}\Var(X_i).
\end{split} \]
Then, for the general case, denote $Y_i = X_i-\mu$, then $\E(Y_i)=0$ and $\Var(Y_i) = \Var(X_i)$, applying the above result, we have
\[\Var(S_n) = \E[(S_n-n\mu)^2] =\E[(\sum^n_{i=1}Y_i)^2] =\sum^n_{i=1}\Var(Y_i)= \sum^n_{i=1}\Var(X_i).\]
\[\tag*{$\blacksquare$}\]
If $X_i$ converges to $0$ in $L^p$, then it converges to $0$ in probability.
By Chebyshev ineqaulity, we have
\[\P(|X_i|\geq \ve )\leq \frac{\E(|X_i|^p)}{\ve^p},\]
if $X_i\raw0$ in $L^p$, the right one will become $0$ as $n\raw\infty$, which means $X_n\raw 0$ in probability.
\[\tag*{$\blacksquare$}\]
Let $\{X_i\}$ be random variables with $\E(X_i) = \mu$. Suppose they are uncorrelated and $\Var(X_i)\lt C\lt\infty$ for all $i$,
then\[\frac{S_n}{n}\raw\mu\]
in $L^2$ and in probability.
\[\begin{split}
\E[(\frac{S_n}{n}-\mu)^2] = \Var(\frac{S_n}{n})=\frac{\sum^n_{i=1}\Var(X_i)}{n^2}=\frac{C}{n}\raw 0
\end{split}\]
thus convergence in $L^2$ is proved. Then by the lemma above, convergence in probability is also obtained.
\[\tag*{$\blacksquare$}\]
3. Fourth moment strong law
Let $\{X_i\}$ be independent and identically distributed (i.i.d.) random variables with $\E(X_i)=\mu$ and $\E(X_i^4) \lt \infty$, then
\[\frac{S_n}{n}\raw\mu\quad a.s.\]
Denote $Y_i=X_i-\mu$, then \[\frac{S_n}{n}\raw \mu\quad a.s. \Longleftrightarrow\frac{\sum^n_{i=1}Y_i}{n}\raw 0\quad a.s.,\]
so we only need to prove the case when $\mu = 0$.
we have\[\E(S_n^4) = n\E(X_i^4) + 3n(n-1)[\E(X_i^2)]^2 = O(n^2),\]
then for all $\ve\gt 0$,
\[\P(|\frac{S_n}{n}|\geq\ve) = \P(|S_n|\geq n\ve)\leq \frac{\E(S_n^4)}{(n\ve)^4} = O(\frac{1}{n^2}). \]
Since $\sum^\infty_{n=1}1/n^2\lt \infty$, we have
\[\sum_{n}\P(|\frac{S_n}{n}|\geq\ve)\lt \infty,\]
by the first Borel-Cantelli lemma (
here),
\[\P(|\frac{S_n}{n}|\geq\ve\;\text{i.o.}) = 0,\]
by (2), this implies $\ds\frac{S_n}{n}\raw 0$ a.s.
\[\tag*{$\blacksquare$}\]
Bibliography
-
Kai Lai Chung, A Course in Probability Theory, 3rd edition (2001)
-
Rick Durrett, Probability: Theory and Examples, 5th edition (2019)