[R] Chosing a subset of a non-sorted vector

Tue May 22 11:38:43 CEST 2007

You want to select two subplots for each DL value. Try:

  df <- data.frame( DL=gl(3,4), subplot=rep(1:4,3) )

  df$index <- 1:nrow(df)
  ind <- tapply( df$index, df$DL, function(x) sample(x,2) )
  df[ unlist(ind), ]

You could also have used rownames(df) instead of creating df$index.

OR

   tmp <- lapply( split(df, df$DL), function(m) m[sample(1:nrow(m),2),] )
   do.call("rbind", tmp)

Regards, Adai

Christoph Scherber wrote:
> Dear all,
> 
> I have a tricky problem here:
> 
> I have a dataframe with biodiversity data in which suplots are a 
> repeated sequence from 1 to 4 (1234,1234,...)
> 
> Now, I want to randomly pick two subplots each from each diversity level 
> (DL).
> 
> The problem is that it works up to that point - but if I try to subset 
> the whole dataframe, I get stuck:
> 
> DL=gl(3,4)
> subplot=rep(1:4,3)
> diversity.data=data.frame(DL,subplot)
> 
> 
> subplot.sampled=NULL
> for(i in 1:3)
> subplot.sampled=c(subplot.sampled,sort(sample(4,2,replace=F)))
> 
> subplot.sampled
> [1] 3 4 1 3 1 3
> subplot[subplot.sampled]
> [1] 3 4 1 3 1 3
> 
> ## here comes the tricky bit:
> 
> diversity.data[subplot.sampled,]
>      DL subplot
> 3    1       3
> 4    1       4
> 1    1       1
> 3.1  1       3
> 1.1  1       1
> 3.2  1       3
> 
> How can I select those rows of diversity.data that match the exact 
> subplots in "subplot.sampled"?
> 
> 
> Thank you very much for your help!
> 
> Best wishes,
> Christoph
> 
> (I am using R 2.4.1 on Windows XP)
> 
> 
> ##
> Christoph Scherber
> DNPW, Agroecology
> University of Goettingen
> Waldweg 26
> D-37073 Goettingen
> 
> +49-(0)551-39-8807
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>