[R] performing functions on variables of different length

Duncan Murdoch murdoch at stats.uwo.ca
Mon May 8 16:09:27 CEST 2006


On 5/8/2006 4:52 AM, Bob Green wrote:
> I am hoping for some assistance with a problem that has puzzled me. 
> Immediately below is the error messages  I obtained when I tried to perform 
> two functions.
> 
> (A)
> 
> tapply(outcome,na.rm=T,grp,mean)
> Error in tapply(outcome, na.rm = T, grp, mean) :
>          arguments must have same length
> +++++++++++++++++++++++++++++++++++++++++++++++
> (B)
> 
> library(nlme)
>  > anova(lme(outcome ~ grp * time, random = ~ 1 | subject))
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>          variable lengths differ
> 
> I assumed this was due to a missing value, as all the variables appeared to 
> have the same number of values . However, when I ran the length command 
> this wasn't the case:
> 
> length(outcome)
> [1] 368
>  > length(grp)
> [1] 184
>  > length(subject)
> [1] 92
>  > length(time)
> [1] 184
> 
> 
> Below is the syntax I have been using and the date frame that I generated -
> 
> study1dat <- read.csv("c:\\study1rb.csv",header=T)
> attach (study1dat)
> 
> outcome <- c(t1freq, t2freq,t3freq,t4freq)
> grp <- factor( rep(group, 2) )
> time <- gl(4, 46)
> subject <- gl(46,1,92)
> data.frame(subject, grp, time, outcome)
> 
>      subject grp time outcome
> 1         1   0    1       4
> 2         2   0    1       3
> 3         3   0    1       7
> 4         4   0    1       0
> 5         5   0    1       1

...
> 365      43   1    4       1
> 366      44   1    4       7
> 367      45   1    4       0
> 368      46   1    4       5
> 
> Any assistance that can be offered is appreciated,


The data.frame function will make repetitions of variables as necessary. 
  Other functions won't necessarily do this.  I'd recommend putting the 
variables into a dataframe (thus forcing them to be the same length), 
and then accessing columns directly from there, e.g.

mydf <- data.frame(subject, grp, time, outcome)

tapply(mydf$outcome,na.rm=T,mydf$grp,mean)

# OR

with(mydf, tapply(outcome,na.rm=T,grp,mean))

anova(lme(outcome ~ grp * time, random = ~ 1 | subject, data=mydf))

I haven't tested these suggestions, so there may be typos.

Duncan Murdoch




More information about the R-help mailing list