summary() of lm() problem (PR#135)
fears@roycastle.liv.ac.uk
fears@roycastle.liv.ac.uk
Tue, 9 Mar 1999 14:20:59 +0100
Debuggers,
I wrote to r-help about this and was appropriately told off by Peter
Dalgaard. I append that mail in case you have not seen it.
Following Peter's advice I have attempted to simplify the problem.
First note that the following does *not* fail (by which I mean crash, as
in generate a memory access violation):
> tmp<-matrix(c(1,0,0,1,1,1),2,3)
> dimnames(tmp)<-list(NULL,c('yvar','x1','x2'))
> lm(tmp[,'yvar']~tmp[,'x1']+tmp[,'x2'])
> summary(.Last.value)
I tried to cut down my original data set to just the first ten rows to
make it manageable to transmit. Of course then when I ran lm() there
were NA estimates. Thus I wasn't totally surprised that summary() would
have trouble. But, unlike the above, it crashes fatally.
Thinking to reproduce this very simply, I used (sorry quick and dirty, I
know there's a way to use paste to give the model formula):
> tmp<-matrix(c(1,0,0,1,rep(1,56)),2,30)
> dimnames(tmp)<-list(NULL,paste('x',1:30,sep=''))
> lm(tmp[, "x1"] ~ tmp[, "x2"] + tmp[, "x3"] + tmp[, "x4"] + tmp[,
"x4"] + tmp[, "x5"] + tmp[, "x6"] + tmp[, "x7"] + tmp[, "x8"] +
tmp[, "x9"] + tmp[, "x10"] + tmp[, "x11"] + tmp[, "x12"] + tmp[,
"x13"] + tmp[, "x14"] + tmp[, "x15"] + tmp[, "x16"] + tmp[, "x17"] +
tmp[, "x18"] + tmp[, "x19"] + tmp[, "x20"]+tmp[, "x21"]+tmp[,
"x22"]+tmp[, "x23"]+tmp[, "x24"]+tmp[, "x25"]+tmp[, "x26"]+tmp[,
"x27"]+tmp[, "x28"]+tmp[, "x29"]+tmp[, "x30"])
> summary(.Last.value)
But this has no problem. So it doesn't seem to be singularity of X or
the length of the model formula at fault (my problem data has 27
variables).
What follows now is what *does* give a fault. The data (in sasch2) is
truncated to just the first 10 rows. I made it so the modified dataset
is called sasch2 so that I could cut and paste the exact same lm() call:
> tmp<-sasch2[1:10,]
> holdsasch2<-sasch2
> sasch2<-tmp
> dump('sasch2','bugs.dump')
> lm(sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[, "td60"] +
sasch2[, "td90"] + sasch2[,
+ "td120"] + +sasch2[, "td180"] + sasch2[, "td240"] + sasch2[,
"td300"] + sasch2[, "td360"] +
+ sasch2[, "td420"] + sasch2[, "td480"] + sasch2[, "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] +
+ sasch2[, "db2.5"] + +sasch2[, "db3.5"] + sasch2[, "db4"] +
sasch2[, "db4.5"] + sasch2[, "db5"] +
+ sasch2[, "db5.5"] + +sasch2[, "db6"] + sasch2[, "db6.5"] +
sasch2[, "db7"] + sasch2[, "db7.5"] +
+ sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[, "db9"] + sasch2[,
"db9.5"])
> summary(.Last.value)
Dr Watson tells me the access violation is 0xc0000005 at address
0x2020200 (whatever that means; I last used machine code on a PDP 8).
The data as dumped now follows (though it is anonymised, please continue
to treat the data as confidential). After that is my original report to
r-help.
I'd be very happy to provide any other info you might need to know. PD
suggested running on Unix but my version there is hopelessly out of date
since I started using NT. I hope I've given you enough here to try it
for yourself on the latest unix version without too much bother.
"sasch2" <-
structure(c(1, 1, 1, 2, 2, 3, 4, 5, 5, 6, 3, 2, 0.5, 0, 2, 4,
6, 1.5, 6, 5, 225, 175, 180, 255, 140, 140, 236, 315, 90, 190,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2, 5, 7, 3, 3, 5, 4, 2.5, 4, 3, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 611231, 611231, 611231, 611041, 611041,
611553, 605829, 604881, 604881, 612966, 4, 4, 4, 4, 4, 4, 3,
3, 3, 2, 25, 25, 25, 29, 29, 29, 31, 28, 28, 18, 289, 289, 289,
279, 279, 289, 296, 268, 268, 281, 1, 1, 1, 2, 2, 2, 2, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
655, 655, 655, 780, 780, 180, 347, 295, 295, 240, 1, 1, 1, 1,
1, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2,
0, 0, 0, 0, 0, 5, 5, 5, 4, 4, 4, 4, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 2, 2, 2, 3, 3, 1, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 5, 5, 3, 4, 3, 3,
2, 234, 234, 234, 24, 24, 12, 17, 127, 127, 27, 1, 1, 1, 1, 1,
3, 3, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 2, 2, 0, 300, 300, 300, 100,
100, 200, 500, 200, 200, 400, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 809782,
809782, 809782, 809790, 809790, 809797, 809813, 809832, 809832,
809840, 3600, 3600, 3600, 3740, 3740, 3280, 3090, 3100, 3100,
4110, 4, 4, 4, 9, 9, 9, 9, 10, 10, 9, 9, 9, 9, 10, 10, 10, 9,
10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7.38, 7.38, 7.38, 7.34,
7.34, 7.27, 7.33, 7.3, 7.3, 7.29, NA, NA, NA, -3.4, -3.4, -5.8,
-6.4, -2, -2, -7.2, 2, 2, 2, 3, 3, 5, 4, 2.5, 2.5, 3, 225, 225,
225, 255, 255, 140, 236, 315, 315, 190, 5, 5, 5, 3, 3, 9, 10,
4, 4, 8, 400, 400, 400, 395, 395, NA, NA, 405, 405, 210, 7, 7,
7, 5, 5, NA, NA, 10, 10, 10, 580, 580, 580, NA, NA, NA, NA, NA,
NA, NA, 7.5, 7.5, 7.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
315, 315, 190, NA, NA, NA, NA, NA, NA, NA, 4, 4, 8, 440, 440,
440, 390, 390, NA, NA, NA, NA, NA, NA, NA, NA, 3, 3, NA, NA,
NA, NA, NA), .Dim = c(10, 83), .Dimnames = list(NULL, c("pxid",
"ddiff", "tdiff", "td30", "td60", "td90", "td120", "td180", "td240",
"td300", "td360", "td420", "td480", "td2000", "dbase", "db1",
"db1.5", "db2", "db2.5", "db3", "db3.5", "db4", "db4.5", "db5",
"db5.5", "db6", "db6.5", "db7", "db7.5", "db8", "db8.5", "db9",
"db9.5", "lwh.num", "partogram", "age", "gestation", "cervix",
"effaced", "membranes", "rd.interval", "action.line", "synto",
"synto.time", "rom", "blood", "ctg", "fbs", "fbs.no", "iupc",
"amnioinf", "ves", "anaesth", "delivery.mode", "perineum", "blood.loss",
"transfus", "drugs", "retained.pl", "baby.no", "weight", "apgar1",
"apgar5", "scbu", "ph.ven", "be.ven", "dilat1", "time2", "dilat2",
"time3", "dilat3", "time4", "dilat4", "time5", "dilat5", "time6",
"dilat6", "time7", "dilat7", "srom.time", "srom.dilat", "epi.time",
"epi.dilat")))
************************************************************************
Under NT 4.0, using Version 0.63.2 Beta (Jan 12, 1999):
Not sure if this is a bug or a feature (forcing me to program less
clumsily) so I'll report it here rather than to bugs.
With a medium size data set (1700 observations,70 explanatory variables)
and plenty of memory, specifically
> gc()
free total
Ncells 886738 1000000
Vcells 7912909 8388608
I get a fatal error when attempting summary() on the fit of an lm() on a
large-ish set of dummy variables (stored in a matrix):
Call:
lm(formula = sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[, "td60"]
+ sasch2[, "td90"] + sasch2[, "td120"] + +sasch2[, "td180"] +
sasch2[, "td240"] + sasch2[, "td300"] + sasch2[, "td360"] +
sasch2[, "td420"] + sasch2[, "td480"] + sasch2[, "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] + sasch2[, "db2.5"] + +sasch2[, "db3.5"]
+ sasch2[, "db4"] + sasch2[, "db4.5"] + sasch2[, "db5"] + sasch2[,
"db5.5"] + +sasch2[, "db6"] + sasch2[, "db6.5"] + sasch2[, "db7"] +
sasch2[, "db7.5"] + sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[,
"db9"] + sasch2[, "db9.5"])
I get estimates OK, but summary() collapses. However, if I do the same
thing less clumsily, by writing all the relevant variables to a new data
frame, and then calling
Call:
lm(formula = ddiff ~ ., data = dtmp)
I get not only the estimates but can also summary() with no problem.
Any ideas why? Seems to be memory-linked, because I can lm() and
summary() the matrix versions using only the sasch2[,'td*'] or db*
variable sets.
Simon Fear
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._