[R] Odp: Sampling in R
Petr PIKAL
petr.pikal at precheza.cz
Tue Apr 21 13:08:55 CEST 2009
Hi
r-help-bounces at r-project.org napsal dne 21.04.2009 12:25:01:
>
> Dear R users,
>
> I need to do sampling without replacement (bootstraps). I have two
variables
> (Xvar, Yvar).
> I have a correlation from original data set cor(Xvar, Yvar)=0.6174221. I
am
> doing 50000 sampling,
> and in each sampling calculating correlations, saving, sorting and
getting
> 95% cutt off point (0.1351877).
> I am getting maximum value as 0.3507219 (much smaller than correlation
of my
> original data).
> I repeated the sampling a couple of time and none of them produced a
correlation
> coefficient higher than my original data set. However, if I sort out my
Xvar
> and Yvar and
> obtain correlation it is 0.9657125 which is much higher than correlation
for
> my original data.
> I am doing sampling in another program and getting at least 1% higher
> correlation than mine.
> Now I am getting confused with sampling(random data) in R. My data and
codes
> for the scenario above are below
>
>
>
Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185,
> 0.4035088,0.4901961,0.3650794,0.462963,0.4,0.56,0.3965517,0.4909091,
> 0.4716981,0.4310345,0.2,0.1509434,0.2647059,0.173913,0.1914894,0.
>
1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128,
> 0.2173913,0.2380952,0.1632653,0.5614035,0.3396226,0.4909091,0.3770492,
> 0.5,0.5185185,0.5,0.4666667,0.4464286,0.362069,0.4285714,0.4561404,
> 0.4736842,0.4545455,0.4166667,0.4181818,0.4590164,0.5166667,0.5423729,
> 0.4833333,0.5454545,0.4393939,0.5172414,0.4098361,0.4745763,0.4754098,
> 0.5166667,0.5,0.4603175,0.42,0.4038462,0.4897959,0.3148148,0.3673469,
> 0.4,0.4583333,0.3877551,0.4375,0.4117647,0.4313725,0.5333333,0.3962264,
> 0.3548387,0.5272727,0.4137931,0.3928571,0.4666667,0.4210526,0.4363636,
>
0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714,
> 0.4516129,0.5090909,0.4464286,0.4642857,0.4166667,0.4098361,0.4909091,
>
0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545,
> 0.5925926,0.5769231,0.5818182,0.4444444,0.4915254,0.4727273,0.4107143,
> 0.4285714,0.4310345,0.4237288,0.4285714,0.440678,0.4237288,0.4807692,
> 0.4150943,0.4615385,0.4107143,0.4814815,0.4074074,0.4210526,0.5263158,
> 0.440678,0.4576271,0.5344828,0.5,0.5636364,0.4677419,0.5,0.5192308,
> 0.4642857,0.5090909,0.58,0.4482759,0.5098039,0.4035088,0.4210526,0.
>
5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019,
> 0.4576271,0.5471698,0.4745763,0.4821429)
>
>
Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182,
> 0.4,0.4,0.3278689,0.2903226,0.4074074,0.4181818,0.3,0.2238806,0.3728814,
> 0.3709677,0.2307692,0.2830189,0.2244898,0.2142857,0.2131148,0.22,0.
>
2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714,
> 0.2068966,0.6,0.4285714,0.3134328,0.4461538,0.3965517,0.4769231,0.
>
6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271,
> 0.4516129,0.5272727,0.4603175,0.4,0.4912281,0.5384615,0.5,0.4516129,0.
> 4126984,0.4655172,0.5263158,0.4925373,0.358209,0.4285714,0.4920635,
> 0.4482759,0.3235294,0.4,0.4375,0.440678,0.3898305,0.35,0.4528302,0.58,
> 0.4153846,0.3174603,0.5185185,0.3870968,0.2894737,0.3709677,0.369863,
> 0.3676471,0.3636364,0.3088235,0.328125,0.4032258,0.4084507,0.3188406,
>
0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879,
> 0.453125,0.4324324,0.4057971,0.4545455,0.4492754,0.5,0.4098361,0.
>
4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462,
> 0.4626866,0.4057971,0.3676471,0.5322581,0.5428571,0.375,0.4411765,0.
>
4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104,
> 0.4032258,0.3606557,0.5178571,0.4848485,0.390625,0.375,0.4375,0.
>
3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387,
> 0.4821429,0.5,0.557377,0.4333333,0.5454545,0.4590164,0.3943662,0.
> 5076923,0.5,0.3283582,0.3676471,0.559322)
>
> my.cor<-cor(Xvar, Yvar)
> print(my.cor)
>
> nperm<-49999
> Perm.Cor<-NULL
>
> for (iperm in 1:nperm) {
> XvarNew<-sample(Xvar, size=length(Xvar), replace=FALSE)
> YvarNew<-sample(Yvar, size=length(Yvar), replace=FALSE)
> perm.cor<-cor(XvarNew, YvarNew)
> Perm.Cor<-c(Perm.Cor, perm.cor)
> }
AFAICU you do not sample your data you shuffle them. Then you compute cor
with shuffled data (X and Y are shuffled independently) which results in
low correlation (it is like shuffling cards).
Maybe you could use smaller size and sample not original data but a vector
of indices
perm.cor<-rep(NA, 49999)
for (iperm in 1:nperm) {
ind <- sample(1:length(Xvar), size = 100, replace=FALSE)
perm.cor[iperm] <- cor(Xvar[ind], Yvar[ind])
perm.cor
}
max(perm.cor)
hist(perm.cor)
The result seems to be quite reasonable.
Regards
Petr
> print(max(Perm.Cor))
> XvarSorted<-sort(Xvar, decreasing=TRUE)
> YvarSorted<-sort(Yvar, decreasing=TRUE)
> max.cor<-cor(XvarSorted, YvarSorted)
> print(max.cor)
> if(mat.cor>0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=TRUE)
> if(mat.cor<0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=FALSE)
> T95<-Perm.Cor.Sorted[(nperm+1)*0.05] # 95% treshold value
> T99<-Perm.Cor.Sorted[(nperm+1)*0.01] # 99% treshold value
>
>
>
> I want to understand where I am making a mistake. Any comment is deeply
appreciated.
>
> Kind Regards
>
> Seyit Ali
>
>
>
------------------------------------------------------------------------------------------------------------------
> Dr. Seyit Ali KAYIS
> Selcuk University
> Faculty of Agriculture
> Kampus, Konya, TURKEY
>
> s_a_kayis at yahoo.com, s_a_kayis at hotmail.com
> Tell: +90 332 223 2830 Mobile: +90 535 587 1139 Fax: +90 332 241 0108
>
> Greetings from Konya, TURKEY
> http://www.ziraat.selcuk.edu.tr/skayis/
>
----------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
>
>
> _________________________________________________________________
> Earning enough? Find out with SEEK Salary Survey
>
>
%2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn%
> 5Fenough&_t=757263783&_r=Seek_NZ_tagline&_m=EXT
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list