总体:全体成年男子的抽烟情况 样本:50个同学调查到的全部5000名男子 总体分布:Bernoulli分布
总体:某场生产的所有电容器 样本:抽出的n件产品 样本分布: 假设每个样本的分布iid,且都服从指数分布 P ( X 1 = x 1 , X 2 = x 2 , . . . , X n = x n ) = Π i = 1 n λ e − λ x i P(X_1=x_1,X_2=x_2,...,X_n=x_n)=\Pi_{i=1}^{n} \lambda e^{-\lambda x_i } P(X1=x1,X2=x2,...,Xn=xn)=Πi=1nλe−λxi
我认为这个结论是不合理的,因为总体是所有毕业生,但是样本是返校毕业生,工资低混的不好的毕业生不太愿意返校,抽样不随机。毕业生平均工资低于5万美金。
平均工资,平均年龄等样本数据一般有偏,样本均值不适合代表平均水平。
3+4+8+3+2=20
分布函数要求右连续 F 20 ( x ) = { 0 x < 38 3 20 38 ≤ x < 48 7 20 48 ≤ x < 58 3 4 58 ≤ x < 68 9 10 68 ≤ x < 78 1 x ≥ 78 F_{20}(x)=\left\{ \begin{aligned} &0 \qquad & x< 38 \\ &\frac{3}{20} & 38\leq x< 48 \\ &\frac{7}{20} & 48\leq x< 58\\ &\frac{3}{4} &58\leq x<68\\ &\frac{9}{10}&68\leq x< 78\\ &1& x\geq 78 \end{aligned} \right. F20(x)=⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧0203207431091x<3838≤x<4848≤x<5858≤x<6868≤x<78x≥78
y ˉ = 3 x ˉ − 4 \bar{y}=3\bar{x}-4 yˉ=3xˉ−4
s y 2 = 1 n − 1 ∑ i ( y i − y ˉ ) 2 = 1 n − 1 ∑ i ( 3 x i − 4 − ( 3 x ˉ − 4 ) ) 2 = 1 n − 1 ∑ i 9 ( x i − x ˉ ) 2 = 9 s x 2 s_y^2=\frac{1}{n-1}\sum_{i}(y_i-\bar{y})^2=\frac{1}{n-1}\sum_{i}(3x_i-4-(3\bar{x}-4))^2=\frac{1}{n-1}\sum_{i}9(x_i-\bar{x})^2=9s_x^2 sy2=n−11∑i(yi−yˉ)2=n−11∑i(3xi−4−(3xˉ−4))2=n−11∑i9(xi−xˉ)2=9sx2
pf: ( n + 1 ) x n + 1 ˉ − ( n + 1 ) x n ˉ = x n + 1 − x n ˉ (n+1)\bar{x_{n+1}}-(n+1)\bar{x_n}=x_{n+1}-\bar{x_n} (n+1)xn+1ˉ−(n+1)xnˉ=xn+1−xnˉ 左右同时除以n+1即得所证
pf: $ns_{n+1}2-(n-1)s_{n}2=\sum_{i=1}{n+1}(x_i-\bar{x}_{n+1})2-\sum_{i=1}{n}(x_i-\bar{x}_n)2 =x_{n+1}2-2(\sum_{i=1}{n+1}x_i \bar{x}{n+1}-\sum{i=1}^{n}x_i \bar{x}{n})+((n+1)\bar{x}{n+1}2-n\bar{x}_n2)=x_{n+1}2-2[x_{n+1}\bar{x}_{n+1}-\sum_{i=1}{n}x_i(\bar{x}{n+1}-\bar{x}{n})]+((n+1)\bar{x}{n+1}2-n\bar{x}_n2)=x{n+1}2-2[x_{n+1}\bar{x}_{n+1}-\frac{n}{n+1}(x_{n+1}-\bar{x}_n)\bar{x}_n]+((n+1)\bar{x}_{n+1}2-n\bar{x}_n^2) $ 把 x ˉ n + 1 \bar{x}_{n+1} xˉn+1带入上一条证明中的 x ˉ n + 1 n + 1 ( x n + 1 − x ˉ n ) \bar{x}_n+\frac{1}{n+1}(x_{n+1}-\bar{x}_n) xˉn+n+11(xn+1−xˉn) 可得 n s n + 1 2 − ( n − 1 ) s n 2 = n n + 1 ( x n + 1 − x ˉ n ) 2 n s_{n+1}^2-(n-1)s_{n}^2=\frac{n}{n+1}(x_{n+1}-\bar{x}_{n})^2 nsn+12−(n−1)sn2=n+1n(xn+1−xˉn)2 两边同时除以n即为所求
remark:这道题说明随着抽样样本的增加可逐次计算样本 均值与方差
pf: x ˉ = 1 m + n ∑ i m + n x i = ∑ j = 1 m x j 2 + ∑ i = 1 n x i 1 m + n = n x ˉ 1 + m x ˉ 2 m + n \bar{x}=\frac{1}{m+n}\sum_{i}^{m+n}x_{i}=\frac{\sum_{j=1}^{m}x_{j}^{2}+\sum_{i=1}^{n}x_{i}^{1}}{m+n}=\frac{n\bar{x}_1+m\bar{x}_2}{m+n} xˉ=m+n1∑im+nxi=m+n∑j=1mxj2+∑i=1nxi1=m+nnxˉ1+mxˉ2
其中 x j 1 x_{j}^1 xj1表示容量为n的样本中的样本的取值 x i 2 x_{i}^2 xi2表示容量为m的样本中的样本的取值
pf:
s 2 = ∑ i = 1 n ( x i 1 − x ˉ ) 2 + ∑ i = 1 m ( x i 2 − x ˉ ) 2 m + n − 1 s^2=\frac{\sum_{i=1}^{n}(x_{i}^1-\bar{x} )^2+\sum_{i=1}^{m}(x_i^2-\bar{x})^2}{m+n-1} s2=m+n−1∑i=1n(xi1−xˉ)2+∑i=1m(xi2−xˉ)2
= ∑ i = 1 n ( x i 1 − n x ˉ 1 + m x ˉ 2 m + n ) 2 + ∑ i = 1 m ( x i 2 − n x ˉ 1 + m x ˉ 2 m + n m + n − 1 =\frac{\sum_{i=1}^{n}(x_{i}^1-\frac{n\bar{x}_1+m\bar{x}_2}{m+n} )^2+\sum_{i=1}^{m}(x_i^2-\frac{n\bar{x}_1+m\bar{x}_2}{m+n}}{m+n-1} =m+n−1∑i=1n(xi1−m+nnxˉ1+mxˉ2)2+∑i=1m(xi2−m+nnxˉ1+mxˉ2
= ∑ i = 1 n ( x i 1 − x ˉ 1 + m ( x ˉ 1 − x ˉ 2 ) 2 m + n ) 2 m + n − 1 + ∑ i = 1 m ( x i 2 − x ˉ 2 + n ( x ˉ 1 − x ˉ 2 ) 2 m + n ) 2 m + n − 1 =\frac{\sum_{i=1}^n(x_i^1-\bar{x}_1+\frac{m(\bar{x}_1-\bar{x}_2)^2}{m+n})^2}{m+n-1}+\frac{\sum_{i=1}^m(x_i^2-\bar{x}_2+\frac{n(\bar{x}_1-\bar{x}_2)^2}{m+n})^2}{m+n-1} =m+n−1∑i=1n(xi1−xˉ1+m+nm(xˉ1−xˉ2)2)2+m+n−1∑i=1m(xi2−xˉ2+m+nn(xˉ1−xˉ2)2)2
= ( n − 1 ) s 1 2 + ( m − 1 ) s 2 2 + m n ( x ˉ 1 − x ˉ 2 ) 2 m + n m + n − 1 =\frac{(n-1)s_1^2+(m-1)s_2^2+\frac{mn(\bar{x}_1-\bar{x}_2)^2}{m+n}}{m+n-1} =m+n−1(n−1)s12+(m−1)s22+m+nmn(xˉ1−xˉ2)2
由上式记得所求。
E ( x ˉ ) = E ( ∑ i = 1 n x n n ) = 0 E(\bar{x})=E(\frac{\sum_{i=1}^n x_n}{n})=0 E(xˉ)=E(n∑i=1nxn)=0
V a r ( x ˉ ) = 1 n 2 ∑ i = 1 n V a r ( x i ) = 1 n V a r ( x i ) Var(\bar{x})=\frac{1}{n^2}\sum_{i=1}^{n} Var(x_i)=\frac{1}{n}Var(x_i) Var(xˉ)=n21∑i=1nVar(xi)=n1Var(xi)
V a r ( x i ) = E ( x i 2 ) = 1 2 ∫ − 1 1 x 2 d x = 1 3 Var(x_i)=E(x_i^2)=\frac{1}{2}\int_{-1}{1}x^2 dx=\frac{1}{3} Var(xi)=E(xi2)=21∫−11x2dx=31
V a r ( x ˉ ) = 1 3 n Var(\bar{x})=\frac{1}{3n} Var(xˉ)=3n1
∑ i < j ( x i − x j ) 2 = 1 2 ∑ i = 1 n ∑ j = 1 n ( ( x i − x ˉ ) + ( x ˉ − x j ) ) 2 = 1 2 ∑ i = 1 n ∑ j = 1 n ( x i − x ˉ ) 2 + ( x j − x ˉ ) 2 − 2 ( x i x j + x ˉ 2 ) = 1 2 ∑ i = 1 n ∑ j = 1 n [ ( x i − x ˉ ) 2 + ( x j − x ˉ ) 2 ] = n ( n − 1 ) s 2 \sum_{i<j}(x_i-x_j)^2=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}((x_i-\bar{x})+(\bar{x}-x_j))^2=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}(x_i-\bar{x})^2+(x_j-\bar{x})^2-2(x_ix_j+\bar{x}^2)=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}[(x_i-\bar{x})^2+(x_j-\bar{x})^2]=n(n-1)s^2 ∑i<j(xi−xj)2=21∑i=1n∑j=1n((xi−xˉ)+(xˉ−xj))2=21∑i=1n∑j=1n(xi−xˉ)2+(xj−xˉ)2−2(xixj+xˉ2)=21∑i=1n∑j=1n[(xi−xˉ)2+(xj−xˉ)2]=n(n−1)s2
由正态分布的再生性 x ˉ 1 ∼ N ( μ , σ 2 n ) , x ˉ 2 ∼ N ( μ , σ 2 n ) \bar{x}_1\sim N(\mu,\frac{\sigma^2}{n}),\bar{x}_2\sim N(\mu,\frac{\sigma^2}{n}) xˉ1∼N(μ,nσ2),xˉ2∼N(μ,nσ2) μ ˉ = x ˉ 1 − x ˉ 2 , μ ˉ ∼ N ( 0 , 2 σ 2 n ) \bar{\mu}=\bar{x}_1-\bar{x}_2,\quad \bar{\mu}\sim N(0,\frac{2\sigma^2}{n}) μˉ=xˉ1−xˉ2,μˉ∼N(0,n2σ2) 记 ϕ \phi ϕ为标准正态分布的分布函数 解 P ( ∣ μ ˉ > σ ∣ ) ≤ 0.01 → 2 ϕ ( σ σ 2 n ) − 1 P(|\bar{\mu}>\sigma|)\leq 0.01\rightarrow 2\phi(\frac{\sigma}{\sigma \sqrt{\frac{2}{n}}})-1 P(∣μˉ>σ∣)≤0.01→2ϕ(σn2 σ)−1得 n ≥ 14 n\geq 14 n≥14
P ( x ( 16 ) > 10 ) = 1 − P ( x ( 16 ) ≤ 10 ) = 1 − P ( x ≤ 10 ) 1 6 = 0.937 P(x_{(16)}>10)=1-P(x_{(16)}\leq 10)=1-P(x\leq 10)^16=0.937 P(x(16)>10)=1−P(x(16)≤10)=1−P(x≤10)16=0.937 P ( x ( 1 ) > 5 ) = [ 1 − P ( x ≤ 5 ) ] 16 = 0.331 P(x_{(1)>5})=[1-P(x\leq 5)]^{16}=0.331 P(x(1)>5)=[1−P(x≤5)]16=0.331
pf: η ∈ [ 0 , 1 ] \eta\in [0,1] η∈[0,1] P ( η i = t ) = i ( n i ) P ( η = t ) P ( η < t ) i − 1 ( 1 − P ( η < t ) ) n − i P(\eta_{i}=t)=i\binom{n}{i}P(\eta=t)P(\eta<t)^{i-1}(1-P(\eta<t))^{n-i} P(ηi=t)=i(in)P(η=t)P(η<t)i−1(1−P(η<t))n−i 由 P ( η < t ) = P ( F ( x ) < t ) = P ( x < F − 1 ( t ) ) = F ⋅ F − 1 ( t ) = t → F(x)连续,对t求导 P ( η = t ) = 1 P(\eta<t)=P(F(x)<t)=P(x<F^{-1}(t))=F\cdot F^{-1}(t)=t \overset{\text{F(x)连续,对t求导}}\rightarrow P(\eta=t)=1 P(η<t)=P(F(x)<t)=P(x<F−1(t))=F⋅F−1(t)=t→F(x)连续,对t求导P(η=t)=1 从而 P ( η i = t ) = i ( n i ) t i − 1 ( 1 − t ) n − i P(\eta_{i}=t)=i\binom{n}{i} t^{i-1}(1-t)^{n-i} P(ηi=t)=i(in)ti−1(1−t)n−i 上述概率密度函数也是n个i.i.d.且服从 U [ 0 , 1 ] U[0,1] U[0,1]的随机变量的次序统计量的概率密度函数。
B ( m , n ) = ∫ 0 1 x m − 1 ( 1 − x ) n − 1 d x = Γ ( m ) Γ ( n ) Γ ( m + n ) B(m,n)=\int_{0}^1 x^{m-1}(1-x)^{n-1}dx=\frac{\Gamma (m)\Gamma (n)}{\Gamma (m+n)} B(m,n)=∫01xm−1(1−x)n−1dx=Γ(m+n)Γ(m)Γ(n)
E ( η i ) = n ( n − 1 i − 1 ) ∫ 0 1 t i ( 1 − t ) n − i = i n ! i ! ( n − i ) ! ( i ) ! ( n − i ) ! ( n + 1 ) ! = i n + 1 E(\eta_i)=n\binom{n-1}{i-1} \int_0^1 t^i(1-t)^{n-i}=i \frac{n!}{i!(n-i)!}\frac{(i)!(n-i)!}{(n+1)!}=\frac{i}{n+1} E(ηi)=n(i−1n−1)∫01ti(1−t)n−i=ii!(n−i)!n!(n+1)!(i)!(n−i)!=n+1i V a r ( η i ) = i n ! i ! ( n − i ) ! i n t 0 1 t i − 1 ( 1 − t ) n − i ( t − i n + 1 ) 2 d t = i n ! i ! ( n − i ) ! [ ( i + 1 ) ! ( n − i ) ! ( n + 2 ) ! − 2 i n + 1 i ! ( n − 1 ) ! ( n + 1 ) ! + i 2 ( n + 1 ) 2 ( i − 1 ) ! ( n − i ) ! n ! ] = i ( n − i + 1 ) ( n + 1 ) 2 ( n + 2 ) Var(\eta_i)=i\frac{n!}{i!(n-i)!}int_{0}^1 t^{i-1}(1-t)^{n-i}(t-\frac{i}{n+1})^2 dt=i\frac{n!}{i!(n-i)!}[\frac{(i+1)!(n-i)!}{(n+2)!}-\frac{2i}{n+1}\frac{i!(n-1)!}{(n+1)!}+\frac{i^2}{(n+1)^2}\frac{(i-1)!(n-i)!}{n!}]=\frac{i(n-i+1)}{(n+1)^2(n+2)} Var(ηi)=ii!(n−i)!n!int01ti−1(1−t)n−i(t−n+1i)2dt=ii!(n−i)!n![(n+2)!(i+1)!(n−i)!−n+12i(n+1)!i!(n−1)!+(n+1)2i2n!(i−1)!(n−i)!]=(n+1)2(n+2)i(n−i+1)
协方差矩阵A,其中 A ( 1 , 1 ) = V a r ( η i ) , A ( 2 , 2 ) = V a r ( η j ) A(1,1)=Var(\eta_i),A(2,2)=Var(\eta_j) A(1,1)=Var(ηi),A(2,2)=Var(ηj),从而只证明 A ( 1 , 2 ) = A ( 2 , 1 ) = c o v ( η 1 , η 2 ) A(1,2)=A(2,1)=cov(\eta_1,\eta_2) A(1,2)=A(2,1)=cov(η1,η2) 先求 η 1 , η 2 \eta_1,\eta_2 η1,η2的联合分布密度函数: 不妨设 i ≤ j i\leq j i≤j,则 P ( η i = t 1 , η j = t 2 ) = ( n i − 1 , j − i − 1 , n − j ) t 1 i − 1 ( t 2 − t 1 ) j − i − 1 t 2 j P(\eta_i=t_1,\eta_j=t_2)=\binom{n}{i-1,j-i-1,n-j}t_1^{i-1}(t_2-t_1)^{j-i-1}t_2^{j} P(ηi=t1,ηj=t2)=(i−1,j−i−1,n−jn)t1i−1(t2−t1)j−i−1t2j
c o v ( η 1 , η 2 ) cov(\eta_1,\eta_2) cov(η1,η2)
= E ( η i η j ) − E ( η i ) E ( η j ) =E(\eta_i\eta_j)-E(\eta_i)E(\eta_j) =E(ηiηj)−E(ηi)E(ηj)
= E ( η i ) − E ( η i ( 1 − η j ) ) − E ( η i ) E ( η j ) =E(\eta_i)-E(\eta_i(1-\eta_j))-E(\eta_i)E(\eta_j) =E(ηi)−E(ηi(1−ηj))−E(ηi)E(ηj)
= i n + 1 − ∫ 0 1 ∫ 0 1 t 1 ( 1 − t 2 ) ⋅ 2 ( n i − 1 , j − i − 1 , n − j ) t 1 i − 1 ( t 2 − t 1 ) j − i − 1 ( 1 − t 2 ) n − j d t 1 d t 2 − i n + 1 j n + 1 =\frac{i}{n+1}-\int_{0}^{1}\int_0^1 t_1(1-t_2) \cdot 2\binom{n}{i-1,j-i-1,n-j} t_1^{i-1}(t_2-t_1)^{j-i-1}(1-t_2)^{n-j} dt_1 dt_2-\frac{i}{n+1}\frac{j}{n+1} =n+1i−∫01∫01t1(1−t2)⋅2(i−1,j−i−1,n−jn)t1i−1(t2−t1)j−i−1(1−t2)n−jdt1dt2−n+1in+1j
= i ( n + 1 − j ) ( n + 2 ) ( n + 1 ) 2 =\frac{i(n+1-j)}{(n+2)(n+1)^2} =(n+2)(n+1)2i(n+1−j)
= a 1 ( 1 − a 2 ) n + 2 =\frac{a_1(1-a_2)}{n+2} =n+2a1(1−a2)
对于上述积分: I = ∫ 0 1 ∫ 0 1 ( n + 2 i , j − i − 1 , n − j + 1 ) 2 t 1 i ( t 2 − t 1 ) j − i − 1 ( 1 − t 2 ) n − j + 1 = 1 E ( η 1 η 2 ) = i ( n − j + 1 ) ( n + 2 ) ( n + 1 ) I I=\int_{0}^1\int_0^1 \binom{n+2}{i,j-i-1,n-j+1}2 t_1^{i}(t_2-t_1)^{j-i-1}(1-t_{2})^{n-j+1}=1\\ E(\eta_1\eta_2)=\frac{i(n-j+1)}{(n+2)(n+1)}I I=∫01∫01(i,j−i−1,n−j+1n+2)2t1i(t2−t1)j−i−1(1−t2)n−j+1=1E(η1η2)=(n+2)(n+1)i(n−j+1)I 关于 I I I的积分:把积分对应到某种概率分布,利用概率密度函数的正则性计算积分。
