统计思维

参考书:
all of statistic 、 principle of statistic inference、computer age statistical inference

history:
1、经典统计
2、统计机器学习
3、数据学习

heroes:Fisher 、 Neyman

分布函数的估计
1678079315274

plugin

线性统计泛函
1678079644828

一本小册子
The Jacknife,the Bootstrap and Other Resampling plans

simulation

bootstrap

由大数定理
1678080085083

Hence, we can use the sample variance of the simulated values to approximate E(Y) and V(Y)

SGD(随机梯度法)

Important Sampling

X1,XnFX_1\cdots,X_n\sim F
F(CDF) f(pdf)

M=h(x)f(x)ds=hfggdx M=\int h(x)f(x)ds=\int h \frac{f}{g}g dx

这里g我们可能会选取正态密度函数等较好的函数,这样可看做在g里采样

hfggdx1nh(xi)f(xi)g(xi) \int h \frac{f}{g}g dx \approx\frac{1}{n}\sum h(x_i)\frac{f(x_i)}{g(x_i)}

The Sampling Importance Resampling (SIR)

  • Sampling candidates$ Y_1,Y_2,\cdots Y_n \sim g$
  • caculate the s importance weights w(yi)w(y_i)
  • resampling X1,X2,XnX_1,X_2,\cdots X_n from Y1,Y2,YnY_1,Y_2,\cdots Y_n

P(XAY1,Y2Yn)=IYiAwiwi P(X\in A|Y_1,Y_2 \cdots Y_n)=\sum I_{Y_i \in A}\frac{w^*_i}{\sum w^*_i}

由强大数定律,知上式趋向于

Aw(yi)g(y)dy=Af(y)dy \int_A w(y_i)^* g(y)dy=\int_A f(y)dy

上面的证明是用来说明XiX_i在样本量充分大时是近似取自f的

bootstrapping

我们有θ=T(F)θ^=T(F^)\theta=T(F), \hat{\theta}=T(\hat{F})
我们希望估计

R(x,F)=(T(F^)T(F))seT(F^) R(x,F)=\frac{(T(\hat{F})-T(F))}{se{T(\hat{F})}}

R中的x是作用到F的,我们模拟R(x,F^)R(x^*,\hat{F})来逼近 R(x,F)R(x,F)

  • step1 Estimate R(x,F)R(x,F) with R(x,F^)R(x^*,\hat{F})
    usually it’s diffcult to caculate R(x,F^)R(x^*,\hat{F}),we have step tow
  • step2 Approximate R(x,F^)R(x^*,\hat{F}) with simulation

Example n=3 x1,x2,x3=1,2,6i.i.d.F{x_1,x_2,x_3}={1,2,6}\sim i.i.d .F,estimate mean θ\theta
xx^*(vector)有333=27种取法

xx^* θ\theta p(θ^,F^)p^*(\hat{\theta}^*,\hat{F})
111 1 127\frac{1}{27}
112 43\frac{4}{3} 327\frac{3}{27}

在n很大时,如上枚举的手段是不切实际的,在此时我们只需要合理的利用采样即可

Bootstrap Variance Estimation

simulate VF(Tn)V_F(T_n) with VF^(Tn)V_{\hat{F}}(T_n)

1678084754261

1678083716399


3月9日


有数据 XFX \rightarrow F,我们希望做R(x,F)R(x,F)的统计,我们利用R(x,F^)R(x^*,\hat{F})来估计R(x,F)R(x,F),其中

X=(X1,,Xn) X^*=(X_1^*,\cdots,X_n^*)

Parametric bootstrap

X=(X1,,Xn)F(x,θ) X^*=(X_1^*,\cdots,X_n^*)\rightarrow F(x,\theta)

我们要做的事情是

X=(X1,,Xn)θ^X X^*=(X_1^*,\cdots,X_n^*)\rightarrow\hat{\theta}\rightarrow X^*

bootstrapping regression

Yi=XiTβ+ϵi,i=1,,n Y_i=X_i^T\beta+\epsilon_i,i=1,\cdots,n

the ϵi\epsilon_i are assumed tobe i.i.d has mean zero and constant variance

(Xi,Yi)β^Yi^ϵ^=YiYi^ϵYβ (X_i,Y_i)\rightarrow\hat{\beta}\rightarrow \hat{Y_i} \rightarrow \hat{\epsilon}=Y_i-\hat{Y_i}\rightarrow\epsilon^*\rightarrow Y^*\rightarrow\beta^*

Bootstrap Confidence Interval

Method 1. The Normal Intervals

vboot=1Bb=1B(Tn,bTn,bˉ)2 v_{boot}=\frac{1}{B}\sum_{b=1}^{B}(T^*_{n,b}-\bar{T^*_{n,b}})^2

nNormal Interval $ (Tn-z_{\frac{\alpha}{2}},Tn+z_{\frac{\alpha}{2}})$

Method 2

let θ=T(F)\theta=T(F) and θn^=T(F^)\hat{\theta_n}=T(\hat{F})
Pivot Rn=θn^θR_n = \hat{\theta_n}-\theta
let θn,1^,,θn,B^\hat{\theta_{n,1}},\cdots,\hat{\theta_{n,B}} be the bootstrap of θ^\hat{\theta}

let θβ^\hat{\theta_{\beta}} denote the β\beta sample quantile of θn,1^,,θn,B^\hat{\theta_{n,1}},\cdots,\hat{\theta_{n,B}}

then the 1α1-\alpha bootstrap pivotal confidence interval is

Cn=(2θn^θ1α2^,2θn^θα2^) C_n=(2\hat{\theta_n}-\hat{\theta_{1-\frac{\alpha}{2}}},2\hat{\theta_n}-\hat{\theta_{\frac{\alpha}{2}}})

贴图证明
1678330825218
1678330861255

more widely
we could define pivot as

Rn=ϕ(θn^)ϕ(θ)1+aϕ(θ)+b R_n = \frac{\phi(\hat{\theta_n})-\phi(\theta)}{1+a\phi(\theta)}+b

Method 3 Percentile Interval

Cn=(θα2,θ1α2),ϕ,continuous,stictly increasing and distribution function H symetric  C_n=(\theta_{\frac{\alpha}{2}^*},\theta_{\frac{1-\alpha}{2}^*}),\phi, \text{continuous,stictly increasing and distribution function H\\ symetric }

purpose :transform distribution F into G

1α=P(hα2ϕ(θ^ϕ(θ^)h1α2))=P(ϕ1(hα2+ϕ(θ^))θ^ϕ1(h1α2+ϕ(θ^)))1-\alpha = P^*(h_{\frac{\alpha}{2}}\le \phi(\hat{\theta^*}-\phi(\hat{\theta})\le h_{\frac{1-\alpha}{2}}))\\ =P*(\phi^{-1}(h_{\frac{\alpha}{2}}+\phi(\hat{\theta}))\le \hat{\theta} \le \phi^{-1}( h_{\frac{1-\alpha}{2}}+\phi(\hat{\theta})))

Jackknife

Tn=T(X1,,Xn)T_n=T_(X_1,\cdots,X_n) and T^{-1} denote the statistic with the i obeservation removed let

Tnˉ=1nT(i) \bar{T_n}=\frac{1}{n}\sum T_{(-i)}

then

Var(Tn)n1ni=1n(T(i)Tnˉ)2 Var(T_n)\approx\frac{n-1}{n}\sum_{i=1}^{n}(T_{(-i)}-\bar{T_n})^2

Exponential Families

场景: 统计物理,信息几何
Def

let M be ameasure on Rn,h:RnRn be a nonegtive function , and let Ti,i=1,sbe masurable function RnR,ηRs,define \text{let M be ameasure on } R^n,h:R^n\rightarrow R^n \text{ be a nonegtive function , and let } \\T_i,i=1,\cdots s \text{be masurable function } R^n\rightarrow R ,\eta \in R^s,\text{define}

A(η)=logexp(1sηiTi(x)) A(\eta)=log \int exp(\sum_{1}{s}\eta_iT_i(x))