[R] Odp: Loop overwrite and data output problems

Fri Feb 26 15:46:34 CET 2010

Hi

I am bit confused what you want to achieve. As I can not reproduce code 
without your data I just guess.

If I understand you want to select from all your data randomly 98 values 
for 98 animals (one for each animal). I presume your id2 is sorted.

One option

# make sorted ids

id2<-sample(1:5, 100, replace=T)
id2<-sort(id2)

# how many unique ids
len<-rle(id2)$lengths
# how many values are from beginning

shift.len<-c(0,cumsum(len))[-(length(len)+1)]

# get one value from each id
samp<-sapply(sapply(split(id2, id2), function(x) 1:length(x)), sample, 1)

# just test
id2[samp+shift.len]
[1] 1 2 3 4 5

The other option is randomise vector of indices
ss<-sample(1:100)

sort data.frame according those randomised indices and select let say 
first one

sapply(split(daf[ss,], daf[ss,1]), function(x) x[1,])

But i believe that there are even better options.

Regards
Petr

r-help-bounces at r-project.org napsal dne 26.02.2010 14:31:06:

> 
> Hello R users,
> 
> I have been using R for a while now for basic stats but I'm now trying 
to
> get my head around looping scripts and in some places I am failing! 
> 
> I have a data set with c. 1200 data points on 98 individual animals with
> data on each row representing a daily measure and I am asking the 
question
> "what variables affect the animal's behaviour?"
> 
> the dataset includes these variables for analyses:
> 
> presence of behaviour, absence of behaviour, site, year, rain, air temp, 
ID,
> Day
> 
> Listed below as they appear in the data set:
> 
> BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY,  PUP_AGE_EST, MO_AIR_TEMP, ID2,
> DAY
> 
> with BEH_T & BEH_F = the response variable for a binomial GLM
> 
> here is the head of the dataset 
> (NB there are only two years and two sites)
> 
>      BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY
> [1,]    14    10    1 2007             0          12    10.98750   1   1
> [2,]    37    23    1 2007             0          13    11.47333   1   2
> [3,]    56    22    1 2007             0          14    12.16667   1   3
> [4,]    43    23    1 2007             0          16    10.91515   1   5
> [5,]    62    16    1 2007             0          17    12.81026   1   6
> [6,]    30    20    1 2007             0          19     8.67037   1   8
> 
> (Sorry the headings are skewed)
> 
> Because I don't want to do too complex a model to start with (just 
wanting
> to learn first with a 'simple' model) I have issues with independence of 
the
> data as there are repeats of individuals - i.e. data taken on the same 
IDs
> on different days. So in order to account for that I have decided to 
random
> sample one data point for each ID then run the GLM on that data for x 
number
> of simulations to see if the explanatory variables are the same/similar
> across all models. (This will reduce my data set to 98 data points, but 
it
> is the best way I can see of doing this without doing mixed-effects 
models,
> since not all IDs are seen at both sites in both years).
> 
> I am also using the MuMIn package for running all subsets of your model
> 
> 
> the code I'm using is:
> 
> 
> for (S in 1:2){
>    Sample.dat<-ALL.R[1,]
>    for (I in 1:98)   {
>       tmp<-ALL.R[ALL.R$ID2==I,]
>       max<-dim(tmp)[1]
>       if (I==1) Sample.dat<-tmp[sample(1:max,1),] else {
> Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),])
>       m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ 
Sample.dat$SITE +
> Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST +
> Sample.dat$MO_AIR_TEMP, family="binomial") 
>    mod<-dredge(m1.R)}}}
> 
> At this point I have two issues if I do it manually then it seems to 
work
> i.e. gives me one output (e.g shown at bottom of post) where I then want 
to
> take the first line, the model with the best AIC using mod[1,] - no 
problem!
> 
> However, letting the code run and for example using print ((mod[1,])) at 
the
> end it prints out the first line of 98 outputs - so I'm not too sure 
what
> I've done wrong here, but it appears to be running a model for each ID -
> something basic no doubt!
> 
> Ideally, what I want to do is take a random sample of the data then run 
the
> model get one output for that take the top line (i.e. the best AIC) and 
save
> this, then run this routine say 100 times, saving that top line every 
time,
> then having a look at the results and take a model average. Anytime I've 
got
> close to this I have issues with overwriting the previous first line of 
the
> model selection and I can't seem to identify how to set this loop up
> properly.
> 
> Any advice or guidance would be most appreciated, I have tried to 
explain my
> issues clearly but if more info is required please just ask,
> 
> Many thanks in advance to those of you that took the time to read this!
> 
> Ross
> 
> Ross Culloch
> Ph.D. Student
> Durham University
> UK
> 
> 
> 
> 
> 
> 
> 
> Here is an example of the model selection table from usingMuMIn:
> 
> 
> Model selection table 
>      (Intr)  S.$MO_     S.$PRE   S.$PUP S.$SIT  S.$YEA k  Dev.   AIC 
AICc 
> delta weight
> 30 645.8000 0.03841            -0.02148 0.2882 -0.3212 5 304.0 687.1 
687.7 
> 0.000  0.707
> 32 648.8000 0.03811  0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 
690.0 
> 2.249  0.230
> 26 785.1000                    -0.02543 0.4678 -0.3905 4 312.8 693.9 
694.3 
> 6.630  0.026
> 31 794.2000          0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 
696.2 
> 8.493  0.010
> 22 582.7000 0.04703                     0.2641 -0.2899 4 314.7 695.8 
696.2 
> 8.529  0.010
> 21 582.8000 0.06893            -0.01967        -0.2899 4 314.9 696.0 
696.4 
> 8.717  0.009
> 29 573.1000 0.04787 -0.0039980          0.2762 -0.2851 5 314.3 697.4 
698.0 
> 10.330  0.004
> 28 600.1000 0.06612  0.0046710 -0.02092        -0.2985 5 314.4 697.4 
698.1 
> 10.370  0.004
> 20   0.7526 0.05509            -0.01808 0.2450         4 321.0 702.0 
702.5 
> 14.770  0.000
> 10 530.4000 0.07447                            -0.2639 3 324.0 703.1 
703.3 
> 15.640  0.000
> 27   0.7493 0.05556 -0.0022820 -0.01753 0.2519         5 320.8 703.9 
704.6 
> 16.850  0.000
> 19 530.0000 0.07455 -0.0001489                 -0.2637 4 324.0 705.1 
705.5 
> 17.820  0.000
> 16 743.4000                             0.4875 -0.3698 3 328.7 707.8 
708.0 
> 20.310  0.000
> 9    0.5512 0.06094                     0.2286         3 328.8 707.9 
708.1 
> 20.430  0.000
> 8    0.6828 0.08019            -0.01688                3 328.9 708.0 
708.2 
> 20.540  0.000
> 18   0.5584 0.06173 -0.0059840          0.2481         4 327.8 708.9 
709.3 
> 21.620  0.000
> 25 739.9000         -0.0016930          0.4944 -0.3681 4 328.6 709.7 
710.1 
> 22.410  0.000
> 17   0.6856 0.07953  0.0012680 -0.01720                4 328.9 709.9 
710.4 
> 22.670  0.000
> 2    0.4985 0.08406                                    2 335.8 712.8 
713.0 
> 25.270  0.000
> 7    0.4996 0.08516 -0.0023780                         3 335.6 714.7 
714.9 
> 27.240  0.000
> 14   1.0760                    -0.02288 0.5151         3 340.8 719.9 
720.1 
> 32.420  0.000
> 23   1.0760          0.0003492 -0.02296 0.5136         4 340.8 721.9 
722.3 
> 34.590  0.000
> 5    0.8587                             0.5304         2 354.0 731.0 
731.1 
> 43.440  0.000
> 12   0.8663         -0.0042170          0.5473         3 353.5 732.5 
732.8 
> 45.070  0.000
> 24 967.8000          0.0198500 -0.03274        -0.4813 4 358.3 739.4 
739.8 
> 52.140  0.000
> 15 942.8000                    -0.02909        -0.4689 3 370.5 749.5 
749.8 
> 62.090  0.000
> 13 915.8000          0.0151200                 -0.4556 3 384.7 763.7 
764.0 
> 76.290  0.000
> 6  900.3000                                    -0.4478 2 391.8 768.9 
769.0 
> 81.320  0.000
> 11   1.3530          0.0176300 -0.02957                3 402.3 781.3 
781.6 
> 93.890  0.000
> 4    1.3940                    -0.02630                2 412.3 789.4 
789.5
> 101.800  0.000
> 3    1.1010          0.0134300                         2 424.4 801.4 
801.6
> 113.800  0.000
> 1    1.1550                                            1 430.3 805.4 
805.4
> 117.700  0.000
> > 
> 
> -- 
> View this message in context: 
http://n4.nabble.com/Loop-overwrite-and-data-
> output-problems-tp1570593p1570593.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.