본문 바로가기

Mining

The square root sampling relationship

sampling을 할 때, sample size를 X배 늘리면, sampling error는 √X 만큼 줄어든다.

ex)
sample size를 100배 늘리면, sampling error는 10배 줄어든다.


데이터 전체 평균
> real_mean<-mean(data[,1])

sample size : 10
> for (k in 1:10000) {
+ sam<-sample(seq(1,nrow(data)),10,replace=T)
+ my_sample<-data$amount[sam]
+ store_diff[k]<-abs(mean(my_sample) - real_mean)
+ }
> mean(store_diff)
[1] 3.586302

sample size : 100
> for (k in 1:10000) {
+ sam<-sample(seq(1,nrow(data)),100,replace=T)
+ my_sample<-data$amount[sam]
+ store_diff[k]<-abs(mean(my_sample) - real_mean)
+ }
> mean(store_diff)
[1] 1.128775

sample size : 1000
> for (k in 1:10000) {
+ sam<-sample(seq(1,nrow(data)),1000,replace=T)
+ my_sample<-data$amount[sam]
+ store_diff[k]<-abs(mean(my_sample) - real_mean)
+ }
> mean(store_diff)
[1] 0.3571449

√10 = 3.2
√100 = 10

sample size: 10 -> 100 : 3.586302 / 1.128775 = 3.2
sample size: 10 -> 1000 : 3.586302 / 0.3571449 = 10
sample size: 100 -> 1000 : 1.128775 / 0.3571449 = 3.2

'Mining' 카테고리의 다른 글

R - Special Values  (0) 2011.04.14
R - Import data (SAS to R, DB to R)  (0) 2011.04.08
R - 데이터 타입 (Data Types)  (0) 2011.04.07
인과관계, 상관관계 (causality, correlation)  (0) 2010.12.12
Simpson's paradox  (0) 2010.08.05