[R] how to apply sample function to each row of a data frame?
Petr Savicky
savicky at cs.cas.cz
Sat Nov 20 09:51:52 CET 2010
On Fri, Nov 19, 2010 at 07:22:57PM -0800, wangwallace wrote:
> actually, what I meant is to draw two random numbers from each row
> separately to create a new data frame. for example, an example output could
> be:
>
> 1 3
> 4 5
> 9 8
This may be done, for example
X <- matrix(1:9, ncol = 3, byrow = TRUE)
colnames(X) <- c("M", "P", "Q")
X <- data.frame(X)
Y <- t(apply(X, 1, sample, 2))
Y is a matrix, since apply() uses as.matrix() on its first argument,
if it is a data frame. If the samples from all rows have the same
column names, Y gets these column names, otherwise no column names
are used. We may get something like
M P
[1,] 1 2
[2,] 4 5
[3,] 7 8
but more typically something like
[,1] [,2]
[1,] 1 2
[2,] 5 6
[3,] 9 8
> Finally, since the column names of the sampled two numbers across these
> three rows will probably be different, I guess I cannot use rbind to put all
> these three rows together.
Combining rows with different column names is possible for matrices.
The column names of the first row are used for the result.
For example
Z <- as.matrix(X)
r1 <- sample(Z[1, ], 2)
r2 <- sample(Z[2, ], 2)
r3 <- sample(Z[3, ], 2)
rbind(r1, r2, r3)
P Q
r1 2 3
r2 5 4
r3 9 7
> Is there anything else (I don't want use list) I
> can use to align three rows with different column names together? Also, if I
> can write a function for it.
Such a function can be written also for data frames, if it sets the names
of the input rows explicitly to the same vector of names before rbind().
> May I use some syntax like the following to
> repeat the whole process 1000 times (i.e., 1000 samples)?
>
> > result<-vector("list",1000)
> > for(i in 1:1000)result[[i]]<-fff(data)#fff(data) is the function name
> > result
This should work. Aternatively, it is possible to use something like
replicate(5, list(t(apply(X, 1, sample, 2))))
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 5 6
[3,] 9 7
[[2]]
[,1] [,2]
[1,] 2 3
[2,] 4 5
[3,] 7 8
[[3]]
[,1] [,2]
[1,] 1 2
[2,] 5 4
[3,] 9 7
[[4]]
M P
[1,] 1 2
[2,] 4 5
[3,] 7 8
[[5]]
[,1] [,2]
[1,] 3 2
[2,] 4 6
[3,] 7 9
where t(apply(X, 1, sample, 2)) may be replaced by a function, which
always produces a matrix with column names.
PS.
More information about the R-help
mailing list