[R] nontabular logistic regression
Jeffrey Stratford
stratja at auburn.edu
Fri Oct 13 19:57:22 CEST 2006
Gavin,
That worked! I went through and I found a few missing cases where I had
"." instead of "NA" - I'm still in SAS mode.
Many thanks!
****************************************
Jeffrey A. Stratford, Ph.D.
Postdoctoral Associate
331 Funchess Hall
Department of Biological Sciences
Auburn University
Auburn, AL 36849
334-329-9198
FAX 334-844-9234
http://www.auburn.edu/~stratja
****************************************
>>> Gavin Simpson <gavin.simpson at ucl.ac.uk> 10/13/06 11:23 AM >>>
On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
> Hi. I'm attempting to fit a logistic/binomial model so I can
determine
> the influence of landscape on the probability that a box gets used by
a
> bird. I've looked at a few sources (MASS text, Dalgaard, Fox and
> google) and the examples are almost always based on tabular predictor
> variables. My data, however are not. I'm not sure if that is the
> source of the problems or not because the one example that includes a
> continuous predictor looks to be coded exactly the same way. Looking
at
> the output, I get estimates for each case when I should get a single
> estimate for purbank. Any suggestions?
>
> Many thanks,
>
> Jeff
Hi Jeff,
using the snippet of data you provided (copy/paste into a text file and
read in with read.table) worked fine:
box.use <- read.table("~/tmp/tmp.txt", header = TRUE)
box.use
str(box.use)
'data.frame': 8 obs. of 10 variables:
$ box : int 1 2 3 4 5 6 7 8
$ use : int 1 1 1 1 0 1 1 0
$ purbank : num 0.00381 0.04429 0.04459 0.06072 0.60810 ...
$ purban2 : num 0.0268 0.1611 0.0604 0.2081 0.6980 ...
$ purban1 : num 0.069 0.172 0.000 0.069 0.690 ...
$ pgrassk : num 0.3282 0.1534 0.1628 0.0194 0.0317 ...
$ pgrass2 : num 0.685 0.383 0.557 0.000 0.128 ...
$ pgrass1 : num 0.759 0.655 0.759 0.000 0.241 ...
$ grassdist : num 0 0 0 323 30 ...
$ grasspatchk: num 3.730 1.023 0.961 0.228 0.263 ...
Now I don't like attach, and you just don't need it so I deviate a
little now. Replace box.use$use directly and make use of the data
argument in glm. Also, your data didn't have any missing data so I'm not
sure whether the response or predictor is missing and whether your
na.omit is needed or not - I don't use it below.
box.use$use <- factor(box.use$use, levels=0:1)
levels(box.use$use) <- c("unused", "used")
box.use
str(box.use)
glm1 <- glm(use ~ purbank, data = box.use, family = binomial())
summary(glm1)
Call:
glm(formula = use ~ purbank, family = binomial(), data = box.use)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.61450 -0.03098 0.31935 0.45888 1.39194
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.223 2.225 1.448 0.147
purbank -6.129 4.773 -1.284 0.199
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8.9974 on 7 degrees of freedom
Residual deviance: 6.5741 on 6 degrees of freedom
AIC: 10.574
Number of Fisher Scoring iterations: 5
I suspect something got messed up in your reading of the data and R
thought purbank was a factor or character. Always check your data after
reading in, and str() is a your friend here as printed representations
are not always what they seem.
HTH
G
>
>
> THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the
rest
> are landscape variables).
>
>
box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk
>
1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73
>
2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261
>
3 1 0.04458785 0.06040268 0 0.1628043 0.557047 0.7586207 0 0.9605769
>
4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.1099 0.2284615
>
5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027
>
6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571
>
7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.86833 0.468
>
8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714
>
> THE CODE:
>
> box.use<- read.csv("c:\\eabl\\2004\\use_logistic2.csv", header=TRUE)
> attach(box.use)
> box.use <- na.omit(box.use)
> use <- factor(use, levels=0:1)
> levels(use) <- c("unused", "used")
> glm1 <- glm(use ~ purbank, binomial)
>
> THE OUTPUT:
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.544e-16 1.414e+00 -3.21e-16 1.000
> purbank0 2.157e+01 2.923e+04 0.001 0.999
> purbank0.001173365 2.157e+01 2.067e+04 0.001 0.999
> purbank0.001466706 2.157e+01 2.923e+04 0.001 0.999
> purbank0.001760047 6.429e-16 2.000e+00 3.21e-16 1.000
> purbank0.002346729 2.157e+01 2.923e+04 0.001 0.999
> purbank0.003813435 2.157e+01 2.923e+04 0.001 0.999
> purbank0.004106776 2.157e+01 2.067e+04 0.001 0.999
> purbank0.004693458 2.157e+01 2.067e+04 0.001 0.999
>
>
> ****************************************
> Jeffrey A. Stratford, Ph.D.
> Postdoctoral Associate
> 331 Funchess Hall
> Department of Biological Sciences
> Auburn University
> Auburn, AL 36849
> 334-329-9198
> FAX 334-844-9234
> http://www.auburn.edu/~stratja
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC & ENSIS, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/cv/
London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list