[R] Observed responses in 'augPred' data frame - Wrong order ?
Marco Barbàra
jabbadhutt at libero.it
Wed Oct 8 19:30:07 CEST 2008
Dea-R community.
I'd like to draw your attention to an issue I have recently
encountered while doing my current data analysis.
I've got an unexpected (to me) result from the command:
> augPred(lmList(my.object)),
'my.object' being a grouped data frame of class:
> class(my.object)
[1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame"
The problem is more easily explained by showing these two to Trellis
plot:
> plot(my.object,some_options)
> plot(augPred(lmList(my.object)),some_options)
[link to output: http://www.palug.net/Members/jabba ]
Clearly the problem is that the predictions are correct, but the
responses in the ``original'' part of the augPred data frame are in the
wrong order, i.e. the response order does not match the grouping
factor order (not sure this is in correct English, sorry).
Debugging the function 'augPred.lmList' I've managed to understand that
the problem resides in the following function calls:
> debug(nlme:::augPred.lmList)
> fm1.lis <- lmList(my.object)
> augPred(fm1.lis)
debugging in: augPred.lmList(fm1.lis)
debug: {
.....
primary <- getCovariate(data)
.....
groups <- getGroups(object)
.....
orig <- data.frame(primary,groups, getResponse(object))
.....
}
This approach works well on other datasets, such us the Orthodont data,
but not for mine because, i guess, my rows are not first ordered by
subject (subject is my grouping factor). In fact:
> getGroups(fm1.lis)[1:5]
[1] CBT.1 CBT.2 CBT.3 CBT.4 CBT.5
34 Levels: CBT.14 < CBT.2 < CBT.11 < CBT.13 < CBT.12 < CBT.5 < ... < V.6
> getResponse(fm1.lis)[1:5]
CBT.1 CBT.1 CBT.1 CBT.1 CBT.2
34 32 29 27 28
My question:
is there a simple way to get the proper result (modify the dataset or
modify the function)? RTFM responses are welcomed.
My (humble?) opinion:
Neither Pinheiro-Bates (2000) nor online documentation, report the
need for a particular row order in the costruction of grouped data
objects, and i think it should be better do not rely on it. Or, at
least, to document it somewhere.
I want to thank the R-core team and the entire R community for this
amazingly efficient and versatile software, and i am very proud that the
(not only) statistical state-of-the-art software have a ``gpl
distribution'' (which have many desirable properties ;-) ).
Cheers.
+---------------------------+
|Relevant system information|
+---------------------------+
> R.version
_
platform i486-pc-linux-gnu
arch i486
os linux-gnu
system i486, linux-gnu
status
major 2
minor 7.2
year 2008
month 08
day 25
svn rev 46428
language R
version.string R version 2.7.2 (2008-08-25)
> sessionInfo()
R version 2.7.2 (2008-08-25)
i486-pc-linux-gnu
locale:
LC_CTYPE=it_IT;LC_NUMERIC=C;LC_TIME=it_IT;LC_COLLATE=it_IT;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=it_IT;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
other attached packages:
[1] MASS_7.2-44 lattice_0.17-15 nlme_3.1-89
loaded via a namespace (and not attached):
[1] grid_2.7.2 tools_2.7.2
--
Marco Barbàra
- Undergraduate student in Statistics
at the University of Palermo (Italy)
- Only free-as-in-freedom software user
- Emacs lover
More information about the R-help
mailing list