[R] Linear model predictions, differences in class
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 19 18:53:33 CEST 2007
tapply gives an array: you want to use as.vector() on its result.
On Tue, 19 Jun 2007, John Phillips wrote:
> Hi,
>
> I am using R to fit statistical models to data were the observations are
> means of the original data. R is used to calculate the mean before fitting
> the model. My problem is: When R calculates the means using tapply, the
> class of the means differs from the class of the original data, which gives
> me trouble when I want to use the original data to calculate model
> predictions. Here is a simple example that demonstrates the problem:
>
>> data.in<-read.table('example.dat',header=TRUE)
>>
>> #Here are the data:
>> data.in
> location x y
> 1 A 17.2 28.46
> 2 A 91.7 143.33
> 3 A 93.6 148.05
> 4 B 95.8 150.28
> 5 B 54.9 89.49
> 6 B 51.1 82.51
> 7 C 53.9 88.46
> 8 C 40.3 63.62
> 9 C 38.5 64.46
> >
>> attach(data.in)
>>
>> #Calculate means by variable "location":
>> data.mn<-data.frame(xm = tapply(x,location,mean), ym =
> tapply(y,location,mean))
>> detach(data.in)
>>
>> #Here are the means:
>> data.mn
> xm ym
> A 67.50000 106.6133
> B 67.26667 107.4267
> C 44.23333 72.1800
>>
>> #Fit the model:
>> mod1<-lm(ym ~ xm, data.mn)
>>
>> mod1
>
> Call:
> lm(formula = ym ~ xm, data = data.mn)
>
> Coefficients:
> (Intercept) xm
> 5.633 1.505
>
>> #R will make "predictions" using the data.mn data frame:
>> predict(mod1,newdata = data.mn)
> A B C
> 107.19260 106.84153 72.18587
>>
>> #But, even if new variables are created in the original data
>> #with names that match those names used in the regression:
> > data.in$xm<-data.in$x
>> data.in$ym<-data.in$y
>> data.in
> location x y xm ym
> 1 A 17.2 28.46 17.2 28.46
> 2 A 91.7 143.33 91.7 143.33
> 3 A 93.6 148.05 93.6 148.05
> 4 B 95.8 150.28 95.8 150.28
> 5 B 54.9 89.49 54.9 89.49
> 6 B 51.1 82.51 51.1 82.51
> 7 C 53.9 88.46 53.9 88.46
> 8 C 40.3 63.62 40.3 63.62
> 9 C 38.5 64.46 38.5 64.46
>>
>> #R will not use data.in to make predictions:
>> predict(mod1,newdata = data.in)
> Error: variable 'xm' was fitted with class "other" but class "numeric" was
> supplied
>>
>> data.in$xm
> [1] 17.2 91.7 93.6 95.8 54.9 51.1 53.9 40.3 38.5
>> data.mn$xm
> A B C
> 67.50000 67.26667 44.23333
>>
>
> Is there a way to make these variables have the same class? Or, is there
> something other than "tapply" that will work better for this?
>
> Thanks!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list