[BioC] mean of individual rows for subsets of columns in an
ExprSet
Vincent Carey 525-2265
stvjc at channing.harvard.edu
Wed Aug 20 12:25:25 MEST 2003
> HI Anna,
>
> I had a similar problem myself and didn't come up with an easy way to solve it, so my approach was
>
> res <- matrix(nrow=length(rownames(x)),ncol=2, byrow=T)
> for (i in 1:length(rownames(x))){
> res[i,1] <- ((x[i,4]+x[i,5]+x[i,6])+x[i,7])/4)
> res[i,2] <- #calculate the SD here
> }
i could not tell from the e-mail whether column statistics or
row statistics were intended. for row statistics, commands
of the form apply(x,1,f) can be used. if f returns a scalar
on vector input (as does the function mean) then apply(x,1,f)
is the vector with ith element f(x[i,]). you could then
use apply(x[,4:7],1,mean) to do the first calculation above
(and could easily modify to median or trimmed mean with this
approach).
if you want to be a little more elegant, you can write
a function that returns the vector of statistics of interest
msd <- function(x) c(mean(x),sqrt(var(x)))
now
apply(x,1,msd)
returns a 2xn matrix where n is the number of rows of x.
msdmat <- t(apply(x,1,msd))
is like your "res" above.
lessons: use apply and R functions whenever feasible.
>
>
> you could probably set this up as a function allowing you to select different columns each time or if can assign your different columns to groups (maybe assign those you want mean and SD for as 1 and those you don't as 0) you could do something like
>
> groups <- c(0,0,0,1,1,1,1)
> calc.means <-function(x, y){
> by(x, y, mean)
> }
> apply(eset at exprs,1 calc.means, y=groups)
this can be done in one step using subscripting within
the apply
msdmat <- t(apply(x[,groups==1],1,msd))
# or specify the groups explicitly in the subscripting
NB: please don't use the "@" notation if it can be avoided.
we provide "accessor" function exprs() that should be used.
More information about the Bioconductor
mailing list