[R] Use of Factors

Beck, Kenneth (STP) Kenneth.Beck at bsci.com
Thu Mar 20 15:26:32 CET 2008


Relatively new to R, I'm trying to do a relatively simple task. I have
data set that has several variables arranged by SubjID and visit, with
multiple observations for that combination. I do linear regression on
those multiple observations, then generated a set of interpolated values
from the regression at fixed intervals along "x". I now want to average
each of those across all the SubjID's. When I use either by() or
tapply(), I get an error indicating the interpolated values are factors,
even though they display looking like floating point numbers. The mean
function returns a value that is obviously wrong, though the count of
observations in the subsets is correct. I am including code snippets to
try to demostrate how this is all created:, sorry for the length of this

Here is output when I try to use the mean function, 
mean_interp_HR=tapply(cpx_interp$HR[cpx_interp$visit==1 &
cpx_interp$xl==0],cpx_interp$SubjId[cpx_interp$visit==1 &
cpx_interp$xl==0],mean)
Warning in mean.default(X[[1L]], ...) :
  argument is not numeric or logical: returning NA
Warning in mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA
Warning in mean.default(X[[3L]], ...) :
  argument is not numeric or logical: returning NA
Warning in mean.default(X[[4L]], ...) :
  argument is not numeric or logical: returning NA
Warning in mean.default(X[[5L]], ...) :
  argument is not numeric or logical: returning NA

Look at the data I am submitting to tapply and mean:
> cpx_interp$HR[cpx_interp$visit==1 & cpx_interp$xl==0]
[1] 62.5252140470478 67.6151493460742 68.3931063786315 78.6591518601803
59.7674671000443
90 Levels: 62.5252140470478 66.046907240618 69.5686004341883
69.8766646005142 71.9631282463843 ... 85.4270562298357
> cpx_interp$SubjId[cpx_interp$visit==1 & cpx_interp$xl==0]
[1] ADENPV07 ADENPVJN ADENPV0Z ADENPVM9 ADENPVMB
Levels: ADENPV07 ADENPVJN ADENPV0Z ADENPVM9 ADENPVMB

Why is the $HR variable listed as "90 levels" as if it is a factor? Why
is it not treated as floating point to get simple mean?

Here is how the HR values are generated:

# create the array
interp_out=array(,c(18,length(cols2)))
# create the values to interpolate to
interp_out[,3]=c(0,25,50,75,100,125,150,175,200,0,25,50,75,100,125,150,1
75,200);
# fill the visits
interp_out[,2]=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
# fill the SubjID
interp_out[,1]=SubjID;
Now fill in interplated values for each visit.
interp_out[1:9,4]=hrv1;interp_out[10:18,4]=hrv2;

# hrv1 & hrv2 come from the following function, the "lm" parameter is
output from the standard lm() function:
interpolateToXL = function(lm,maxxl){
int_values=matrix(nrow=9,ncol=1)
int_values[1,]=coef(lm)[1];
if (maxxl>25)
  int_values[2,]=coef(lm)[1]+coef(lm)[2] * 25
if (maxxl>50)
  int_values[3,]=coef(lm)[1]+coef(lm)[2] * 50
if (maxxl>75)
  int_values[4,]=coef(lm)[1]+coef(lm)[2] * 75
if (maxxl>100)
  int_values[5,]=coef(lm)[1]+coef(lm)[2] * 100
if (maxxl>125)
  int_values[6,]=coef(lm)[1]+coef(lm)[2] * 125
if (maxxl>150)
  int_values[7,]=coef(lm)[1]+coef(lm)[2] * 150
if (maxxl>175)
  int_values[8,]=coef(lm)[1]+coef(lm)[2] * 175
if (maxxl>200)
  int_values[9,]=coef(lm)[1]+coef(lm)[2] * 200
return (int_values)
}


Ken Beck PhD
Research Scientist
Boston Scientific CRM (Guidant)
10-212
kenneth.beck at bsci.com



More information about the R-help mailing list