[R] how to apply sample function to each row of a data frame?

Petr Savicky savicky at cs.cas.cz
Sat Nov 20 09:51:52 CET 2010


On Fri, Nov 19, 2010 at 07:22:57PM -0800, wangwallace wrote:
> actually, what I meant is to draw two random numbers from each row
> separately to create a new data frame. for example, an example output could
> be:
> 
> 1 3
> 4 5
> 9 8

This may be done, for example

  X <- matrix(1:9, ncol = 3, byrow = TRUE)
  colnames(X) <- c("M", "P", "Q")
  X <- data.frame(X)
  Y <- t(apply(X, 1, sample, 2))

Y is a matrix, since apply() uses as.matrix() on its first argument,
if it is a data frame. If the samples from all rows have the same
column names, Y gets these column names, otherwise no column names
are used. We may get something like

       M P
  [1,] 1 2
  [2,] 4 5
  [3,] 7 8

but more typically something like

       [,1] [,2]
  [1,]    1    2
  [2,]    5    6
  [3,]    9    8

> Finally, since the column names of the sampled two numbers across these
> three rows will probably be different, I guess I cannot use rbind to put all
> these three rows together.

Combining rows with different column names is possible for matrices.
The column names of the first row are used for the result.
For example

  Z <- as.matrix(X)
  r1 <- sample(Z[1, ], 2)
  r2 <- sample(Z[2, ], 2)
  r3 <- sample(Z[3, ], 2)
  rbind(r1, r2, r3)

     P Q
  r1 2 3
  r2 5 4
  r3 9 7

> Is there anything else (I don't want use list) I
> can use to align three rows with different column names together? Also, if I
> can write a function for it.

Such a function can be written also for data frames, if it sets the names
of the input rows explicitly to the same vector of names before rbind().

> May I use some syntax like the following to
> repeat the whole process 1000 times (i.e., 1000 samples)?
> 
> > result<-vector("list",1000)
> > for(i in 1:1000)result[[i]]<-fff(data)#fff(data) is the function name
> > result

This should work. Aternatively, it is possible to use something like

  replicate(5, list(t(apply(X, 1, sample, 2))))

  [[1]]
       [,1] [,2]
  [1,]    1    3
  [2,]    5    6
  [3,]    9    7
  
  [[2]]
       [,1] [,2]
  [1,]    2    3
  [2,]    4    5
  [3,]    7    8
  
  [[3]]
       [,1] [,2]
  [1,]    1    2
  [2,]    5    4
  [3,]    9    7
  
  [[4]]
       M P
  [1,] 1 2
  [2,] 4 5
  [3,] 7 8
  
  [[5]]
       [,1] [,2]
  [1,]    3    2
  [2,]    4    6
  [3,]    7    9

where t(apply(X, 1, sample, 2)) may be replaced by a function, which
always produces a matrix with column names.

PS.



More information about the R-help mailing list