统计推断笔记-一至四章

这篇文章是 UncleBob 的统计推断第一至第四章笔记. 这几章主要是一些概率论相关基础知识.

由于是从 LaTeX\LaTeX 格式转化而来,可能发生了一些排版上的改变.

Chapter 1. 概率论基本知识

1.1不等式

定理[Bonferroni 不等式]

P(AB)P(A)+P(B)1, P(A \cap B) \ge P(A) + P(B) - 1,

P(i=1nAi)i=1nP(Ai)(n1). P\left(\bigcap_{i=1}^n A_i\right) \ge \sum_{i=1}^n P(A_i) - (n - 1).

1.2 pdf 与 pmf

略.

1.3 统计学基本定理

定理
X1,,Xni.i.d. F(x)X_1, \dots, X_n \stackrel{\mathrm{i.i.d.}}{\sim}\ F(x)(cdf),则

P{limnsupxFn(x)F(x)=0}=1. P\left\{\lim_{n \to \infty} \sup_x |F_n(x) - F(x)| = 0\right\} = 1.

定义[分布族]
参数θ\theta未知,θΘ\theta \in \Theta.称

{F(xθ):θΘ} \{F(x \mid \theta) : \theta \in \Theta\}

为分布族.

进一步地,设F(x)F={F:F 为满足一定条件的分布函数}F(x) \in \mathcal{F} = \{F : F \text{ 为满足一定条件的分布函数}\},则

P{limnsupFFsupxFn(x)F(x)=0}=1. P\left\{\lim_{n \to \infty} \sup_{F \in \mathcal{F}} \sup_x |F_n(x) - F(x)| = 0\right\} = 1.

Chapter 2. 变换与期望

2.1 变换


XU(0,1)X \sim U(0,1),则logXExp(1)-\log X \sim \mathrm{Exp}(1).


XN(0,1)X \sim N(0,1),则Y=X2χ12Y = X^2 \sim \chi_1^2.

定理[概率积分变换]
XX有连续的 cdfFXF_X,则Y=FX(X)U(0,1)Y = F_X(X) \sim U(0,1).

2.2 积分下求导

定理[Leibniz 法则]
在一定正则化条件下,

ddθa(θ)b(θ)f(x,θ)dx=f(b(θ),θ)b(θ)f(a(θ),θ)a(θ)+a(θ)b(θ)f(x,θ)θdx. \frac{d}{d\theta} \int_{a(\theta)}^{b(\theta)} f(x,\theta)\,dx = f(b(\theta),\theta)b'(\theta) - f(a(\theta),\theta)a'(\theta) + \int_{a(\theta)}^{b(\theta)} \frac{\partial f(x,\theta)}{\partial \theta}\,dx.

Chapter 3. 分布族

3.1 指数分布族

定义[指数分布族]
一个分布族{Pθ:θΘ}\{P_\theta : \theta \in \Theta\}称为kk-维指数分布族,如果其 pdf 或 pmf 可表示为

f(xθ)=h(x)c(θ)exp{i=1kwi(θ)ti(x)},xR, f(x \mid \theta) = h(x)c(\theta) \exp\left\{\sum_{i=1}^k w_i(\theta)t_i(x)\right\}, \quad x \in \mathbb{R},

其中h(x)0,c(θ)>0,wi(θ)h(x) \ge 0,\, c(\theta) > 0,\, w_i(\theta)仅与θ\theta有关,ti(x)t_i(x)仅与xx有关.


XN(μ,σ2)X \sim N(\mu, \sigma^2),其中μR\mu \in \mathbb{R}σ2>0\sigma^2 > 0,令θ=(μ,σ2)T\theta = (\mu, \sigma^2)^\mathrm{T},则

f(xμ,σ2)=12πσ2exp{(xμ)22σ2}=12πσ2exp{μ22σ2}exp{x22σ2+μσ2x}. f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\} = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left\{\frac{\mu^2}{2\sigma^2}\right\} \exp\left\{-\frac{x^2}{2\sigma^2} + \frac{\mu}{\sigma^2}x\right\}.


XBernoulli(p)X \sim \mathrm{Bernoulli}(p)0<p<10 < p < 1,则 pmf 为

f(xp)={px(1p)1x,x{0,1},0,otherwise,={(1p)exp{xlogp1p},x{0,1},0,otherwise. f(x \mid p) = \begin{cases} p^x (1-p)^{1-x}, & x \in \{0,1\},\\[0.3em] 0, & \text{otherwise}, \end{cases} = \begin{cases} (1-p)\exp\left\{x \log\frac{p}{1-p}\right\}, & x \in \{0,1\},\\[0.3em] 0, & \text{otherwise}. \end{cases}


二项分布族:

-XBinomial(n,p)X \sim \mathrm{Binomial}(n,p)nn已知,0<p<10 < p < 1未知(指数分布族);
-XBinomial(n,p)X \sim \mathrm{Binomial}(n,p)n{1,2,}n \in \{1,2,\dots\}未知,pp已知;
-XBinomial(n,p)X \sim \mathrm{Binomial}(n,p)n,pn,p均未知.


Cauchy 分布族:

f(xθ)=1π11+(xθ)2,θR, f(x \mid \theta) = \frac{1}{\pi}\frac{1}{1+(x-\theta)^2}, \quad \theta \in \mathbb{R},

f(xθ,σ)=1πσ11+(xθσ)2,σ>0, f(x \mid \theta, \sigma) = \frac{1}{\pi\sigma} \frac{1}{1+\left(\frac{x-\theta}{\sigma}\right)^2}, \quad \sigma > 0,

不是指数分布族.


f(xθ)={1θexp{1xθ},θ<x<+,0,otherwise,=1θexp{1xθ}I(x>θ), f(x \mid \theta) = \begin{cases} \dfrac{1}{\theta} \exp\left\{1 - \dfrac{x}{\theta}\right\}, & \theta < x < +\infty,\\[0.5em] 0, & \text{otherwise}, \end{cases} = \dfrac{1}{\theta} \exp\left\{1 - \dfrac{x}{\theta}\right\} I_{(x > \theta)},

不是指数分布族.

3.2 指数分布族的性质

定理

E(i=1kwi(θ)θjti(X))=θjlogc(θ),j=1,2,,d=dim(Θ), E\left(\sum_{i=1}^k \frac{\partial w_i(\theta)}{\partial \theta_j} t_i(X)\right) = -\frac{\partial}{\partial \theta_j} \log c(\theta), \quad j = 1, 2, \dots, d = \dim(\Theta),

Var(i=1kwi(θ)θjti(X))=2θj2logc(θ)E(i=1k2wi(θ)θj2ti(X)),j=1,2,,d. \mathrm{Var}\left(\sum_{i=1}^k \frac{\partial w_i(\theta)}{\partial \theta_j} t_i(X)\right) = -\frac{\partial^2}{\partial \theta_j^2} \log c(\theta) - E\left(\sum_{i=1}^k \frac{\partial^2 w_i(\theta)}{\partial \theta_j^2} t_i(X)\right), \quad j = 1, 2, \dots, d.

定义[自然参数]
将指数分布族改写为

f(xθ)=h(x)c(η)exp{i=1kηiti(x)}, f(x \mid \theta) = h(x) c^*(\eta) \exp\left\{\sum_{i=1}^k \eta_i t_i(x)\right\},

其中参数η=(η1,η2,,ηk)\eta = (\eta_1, \eta_2, \dots, \eta_k)称为自然参数,上式称为自然参数形式的指数分布族.
自然参数空间为

H={(η1,η2,,ηk):h(x)exp{i=1kηiti(x)}dx<}. H = \left\{ (\eta_1, \eta_2, \dots, \eta_k) : \int h(x)\exp\left\{\sum_{i=1}^k \eta_i t_i(x)\right\}dx < \infty \right\}.

性质
自然参数空间HH是凸集.

性质
定义

a(η)=logh(x)exp(ηTt(x))dx,ηH, a(\eta) = \log \int h(x) \exp\bigl(\eta^{\mathrm{T}} t(x)\bigr)\,dx, \quad \eta \in H,

其中t(x)=(t1(x),t2(x),,tk(x))Tt(x) = (t_1(x), t_2(x), \dots, t_k(x))^{\mathrm{T}}. 则函数a(η):HRa(\eta) : H \to \mathbb{R}是凸函数.
HH中有内点,则a(η)a(\eta)关于η\eta无穷可微,且有

a(η)=a(η)η=Eη[t(X)], \nabla a(\eta) = \frac{\partial a(\eta)}{\partial \eta} = E_\eta[t(X)],

2a(η)=2a(η)ηηT=Covη(t(X))=Eη[(t(X)Eη[t(X)])(t(X)Eη[t(X)])T], \nabla^2 a(\eta) = \frac{\partial^2 a(\eta)}{\partial \eta\, \partial \eta^{\mathrm{T}}} = \mathrm{Cov}_\eta(t(X)) = E_\eta\left[ (t(X) - E_\eta[t(X)])(t(X) - E_\eta[t(X)])^{\mathrm{T}} \right],

其中Xf(xη)X \sim f(x \mid \eta).

3.3 曲线指数族


XN(μ,σ2)X \sim N(\mu, \sigma^2),且满足μ2=σ2\mu^2 = \sigma^2,则

f(xμ)=12πμ2exp(12)exp(x22μ2+xμ), f(x \mid \mu) = \frac{1}{\sqrt{2\pi \mu^2}} \exp\left(-\frac{1}{2}\right) \exp\left(-\frac{x^2}{2\mu^2} + \frac{x}{\mu}\right),

其自然参数为(12μ2,1μ)(-\tfrac{1}{2\mu^2}, \tfrac{1}{\mu}),其中μR{0}\mu \in \mathbb{R} \setminus \{0\}.


X1,X2,,Xni.i.d. Poisson(λ)X_1, X_2, \dots, X_n \stackrel{\mathrm{i.i.d.}}{\sim}\ \mathrm{Poisson}(\lambda),则

i=1nXinλnλdN(0,1), \frac{\sum_{i=1}^n X_i - n\lambda}{\sqrt{n\lambda}} \xrightarrow{d} N(0,1),

1ni=1nXidN(λ,λn)\tfrac{1}{n}\sum_{i=1}^n X_i \xrightarrow{d} N(\lambda, \tfrac{\lambda}{n}),这属于正态的曲线分布族.

3.4 位置与尺度分布族

定义[位置与尺度分布族]
设随机变量ZZ的 pdf 或 pmf 为fZ(z)f_Z(z),则称:

-Z+μ, μRZ + \mu,\ \mu \in \mathbb{R}为位置分布族,
{fZ+μ(x)=fZ(xμ):μR}\{f_{Z+\mu}(x) = f_Z(x - \mu) : \mu \in \mathbb{R}\}
-σZ, σ>0\sigma Z,\ \sigma > 0为尺度分布族,
{fσZ(x)=1σfZ(xσ):σ>0}\{f_{\sigma Z}(x) = \tfrac{1}{\sigma} f_Z(\tfrac{x}{\sigma}) : \sigma > 0\}
-μ+σZ, μR,σ>0\mu + \sigma Z,\ \mu \in \mathbb{R}, \sigma > 0为位置–尺度分布族,
{fσZ+μ(x)=1σfZ(xμσ):μR,σ>0}\{f_{\sigma Z + \mu}(x) = \tfrac{1}{\sigma} f_Z(\tfrac{x - \mu}{\sigma}) : \mu \in \mathbb{R}, \sigma > 0\}.

3.5 等式与不等式

定理[Chebyshev 不等式]

P(g(X)r)E[g(X)]r. P(g(X) \ge r) \le \frac{E[g(X)]}{r}.

定理[Hoeffding 不等式]
X1,X2,,XnX_1, X_2, \dots, X_n独立,且均值为 0,并满足aiXibia_i \le X_i \le b_i,则对任意ε>0\varepsilon > 0,有

P(i=1nXiε)2exp{2ε2i=1n(biai)2}. P\left( \left|\sum_{i=1}^n X_i\right| \ge \varepsilon \right) \le 2 \exp\left\{ -\frac{2\varepsilon^2}{\sum_{i=1}^n (b_i - a_i)^2} \right\}.

定理[Jensen 不等式]
设随机变量XX满足EX<E|X| < \infty,且f:RRf:\mathbb{R} \to \mathbb{R}为凸函数,则

E[f(X)]f(E[X]). E[f(X)] \ge f(E[X]).

定理[Stein 恒等式]
XN(μ,σ2)X \sim N(\mu, \sigma^2),若g:RRg:\mathbb{R} \to \mathbb{R}可微且Eg(X)<E|g'(X)| < \infty,则

E[g(X)(Xμ)]=σ2E[g(X)]. E[g(X)(X - \mu)] = \sigma^2 E[g'(X)].

定理[Fubini 定理与分部积分公式]
f,gf, gR\mathbb{R}上连续,且g(±)=0g(\pm\infty) = 0,则

+f(x)g(x)dx=+f(x)g(x)dx. \int_{-\infty}^{+\infty} f'(x) g(x)\,dx = - \int_{-\infty}^{+\infty} f(x) g'(x)\,dx.

3.6 正态分布的其他刻画

定理[Cramér–Lévy 定理]
n2n \ge 2X1,,XnX_1, \dots, X_n相互独立,若Sn=X1++XnS_n = X_1 + \dots + X_n服从正态分布,则每个XiX_i均服从正态分布.

定理
X1,,Xni.i.d.N(μ,σ2)X_1, \dots, X_n \stackrel{\mathrm{i.i.d.}}{\sim}\, N(\mu, \sigma^2),则

i=1n(XiXˉ)2σ2χn12. \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{\sigma^2} \sim \chi_{n-1}^2.

反之,若n2n \ge 2X1,,XnX_1, \dots, X_n为独立同分布且关于均值μ\mu对称、方差有限,且满足i=1n(XiXˉ)2/σ2χn12\sum_{i=1}^n (X_i - \bar{X})^2 / \sigma^2 \sim \chi_{n-1}^2
XiN(μ,σ2)X_i \sim N(\mu, \sigma^2).

定理
X1,,Xni.i.d.N(μ,σ2)X_1, \dots, X_n \stackrel{\mathrm{i.i.d.}}{\sim}\, N(\mu, \sigma^2),则XˉN(μ,σ2/n)\bar{X} \sim N(\mu, \sigma^2/n)i=1n(XiXˉ)2/σ2χn12\sum_{i=1}^n (X_i - \bar{X})^2 / \sigma^2 \sim \chi_{n-1}^2,且Xˉ\bar{X}i=1n(XiXˉ)2\sum_{i=1}^n (X_i - \bar{X})^2相互独立.

反之,若连续型独立同分布随机变量X1,,XnX_1, \dots, X_n满足Xˉ\bar{X}i=1n(XiXˉ)2\sum_{i=1}^n (X_i - \bar{X})^2独立,则XiN(μ,σ2)X_i \sim N(\mu, \sigma^2).

Chapter 4. 多维随机向量

0<Var(X)<0 < \mathrm{Var}(X) < \infty0<Var(Y)<0 < \mathrm{Var}(Y) < \infty,定义 Pearson 相关系数为

ρXY=E[(XE[X])(YE[Y])]Var(X)Var(Y). \rho_{XY} = \frac{E[(X - E[X])(Y - E[Y])]} {\sqrt{\mathrm{Var}(X)}\,\sqrt{\mathrm{Var}(Y)}}.

-1ρXY1-1 \le \rho_{XY} \le 1
-ρXY=0\rho_{XY} = 0当且仅当E[(XE[X])(YE[Y])]=0E[(X - E[X])(Y - E[Y])] = 0
-ρXY=1|\rho_{XY}| = 1当且仅当XXYY几乎处处线性相关,
即存在常数a,ba,b,使得P(a(XE[X])+b(YE[Y])=0)=1P(a(X - E[X]) + b(Y - E[Y]) = 0) = 1.


统计推断笔记-一至四章
http://imtdof.github.io/2025/11/11/统计推断笔记-一至四章/
作者
UncleBob
发布于
2025年11月11日
许可协议