[R] Using 'sapply' and 'by' in one function

David & Natalia 3.14david at gmail.com
Sun Feb 10 14:19:47 CET 2008


Greetings,

I'm having a problem with something that I think is very simple - I'd
like to be able to use the 'sapply' and 'by' functions in 1 function
to be able (for example) to get regression coefficients from multiple
models by a grouping variable.  I think that I'm missing something
that is probably obvious to experienced users.

Here's a simple (trivial) example of what I'd like to do:

new <- data.frame(Outcome.1=rnorm(10),Outcome.2=rnorm(10),sex=rep(0:1,5),Pred=rnorm(10))
fxa <- function(x,data)   { lm(x~Pred,data=data)$coef }
sapply(new[,1:2],fxa,new)  # this yields coefficients for the
predictor in separate models

fxb <- function(x)   {lm(Outcome.1~Pred,da=x)$coef};
by(new,new$sex,fxb) #yields the coefficient for Outcome.1 for each sex

## I'd like to be able to combine 'sapply' and 'by' to be able to get
the regression coefficients for Outome.1 and Outcome.2 by each sex,
rather than running fxb a second time predicting 'Outcome.2' or by
subsetting the data - by sex - before I run the function, but the
following doesn't work -

by(new,new$sex,FUN=function(x)sapply(x[,1:2],fxa,new))
'Error in model.frame.default(formula = x ~ Pred, data = data,
drop.unused.levels = TRUE) :
  variable lengths differ (found for 'Pred')'

##I understand the error message - the length of 'Pred' is 10 while
the length of each sex group is 5, but I'm not sure how to correctly
write the 'by' function to use 'sapply' inside it.   Could someone
please point me in the right direction?  Thanks very much in advance

David S Freedman, CDC (Atlanta USA) [definitely not the well-know
statistician, David A Freedman, in Berkeley]



More information about the R-help mailing list