[R] nontabular logistic regression

Fri Oct 13 19:57:22 CEST 2006

Gavin,

That worked!  I went through and I found a few missing cases where I had
"." instead of "NA" - I'm still in SAS mode. 

Many thanks!

****************************************
Jeffrey A. Stratford, Ph.D.
Postdoctoral Associate
331 Funchess Hall
Department of Biological Sciences
Auburn University
Auburn, AL 36849
334-329-9198
FAX 334-844-9234
http://www.auburn.edu/~stratja
****************************************
>>> Gavin Simpson <gavin.simpson at ucl.ac.uk> 10/13/06 11:23 AM >>>
On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
> Hi.  I'm attempting to fit a logistic/binomial model so I can
determine
> the influence of landscape on the probability that a box gets used by
a
> bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
> google) and the examples are almost always based on tabular predictor
> variables.  My data, however are not.  I'm not sure if that is the
> source of the problems or not because the one example that includes a
> continuous predictor looks to be coded exactly the same way.  Looking
at
> the output, I get estimates for each case when I should get a single
> estimate for purbank.  Any suggestions?
> 
> Many thanks,
> 
> Jeff

Hi Jeff,

using the snippet of data you provided (copy/paste into a text file and
read in with read.table) worked fine:

box.use <- read.table("~/tmp/tmp.txt", header = TRUE)
box.use
str(box.use)
'data.frame':   8 obs. of  10 variables:
 $ box        : int  1 2 3 4 5 6 7 8
 $ use        : int  1 1 1 1 0 1 1 0
 $ purbank    : num  0.00381 0.04429 0.04459 0.06072 0.60810 ...
 $ purban2    : num  0.0268 0.1611 0.0604 0.2081 0.6980 ...
 $ purban1    : num  0.069 0.172 0.000 0.069 0.690 ...
 $ pgrassk    : num  0.3282 0.1534 0.1628 0.0194 0.0317 ...
 $ pgrass2    : num  0.685 0.383 0.557 0.000 0.128 ...
 $ pgrass1    : num  0.759 0.655 0.759 0.000 0.241 ...
 $ grassdist  : num    0   0   0 323  30 ...
 $ grasspatchk: num  3.730 1.023 0.961 0.228 0.263 ...

Now I don't like attach, and you just don't need it so I deviate a
little now. Replace box.use$use directly and make use of the data
argument in glm. Also, your data didn't have any missing data so I'm not
sure whether the response or predictor is missing and whether your
na.omit is needed or not - I don't use it below.

box.use$use <- factor(box.use$use, levels=0:1)
levels(box.use$use) <- c("unused", "used")
box.use
str(box.use)
glm1 <- glm(use ~ purbank, data = box.use, family = binomial())

summary(glm1)

Call:
glm(formula = use ~ purbank, family = binomial(), data = box.use)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-1.61450  -0.03098   0.31935   0.45888   1.39194

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    3.223      2.225   1.448    0.147
purbank       -6.129      4.773  -1.284    0.199

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8.9974  on 7  degrees of freedom
Residual deviance: 6.5741  on 6  degrees of freedom
AIC: 10.574

Number of Fisher Scoring iterations: 5

I suspect something got messed up in your reading of the data and R
thought purbank was a factor or character. Always check your data after
reading in, and str() is a your friend here as printed representations
are not always what they seem.

HTH

G

> 
> 
> THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the
rest
> are landscape variables).  
> 
>
box	use	purbank	purban2	purban1	pgrassk	pgrass2	pgrass1	grassdist	grasspatchk
>
1	1	0.003813435	0.02684564	0.06896552	0.3282487	0.6845638	0.7586207	0	3.73
>
2	1	0.04429451	0.1610738	0.1724138	0.1534174	0.3825503	0.6551724	0	1.023261
>
3	1	0.04458785	0.06040268	0	0.1628043	0.557047	0.7586207	0	0.9605769
>
4	1	0.06072162	0.2080537	0.06896552	0.01936052	0	0	323.1099	0.2284615
>
5	0	0.6080962	0.6979866	0.6896552	0.03168084	0.1275168	0.2413793	30	0.2627027
>
6	1	0.6060428	0.6107383	0.3448276	0.04077442	0.2885906	0.4482759	30	0.2978571
>
7	1	0.3807568	0.4362416	0.6896552	0.06864183	0.03355705	0	94.86833	0.468
>
8	0	0.3649164	0.3154362	0.4137931	0.06277501	0.1275168	0	120	0.4585714
> 
> THE CODE:
> 
> box.use<- read.csv("c:\\eabl\\2004\\use_logistic2.csv", header=TRUE)
> attach(box.use)
> box.use <- na.omit(box.use)
> use <- factor(use, levels=0:1)
> levels(use) <- c("unused", "used")
> glm1 <- glm(use ~ purbank, binomial)
> 
> THE OUTPUT:
> 
> Coefficients:
>                      Estimate Std. Error   z value Pr(>|z|)
> (Intercept)        -4.544e-16  1.414e+00 -3.21e-16    1.000
> purbank0            2.157e+01  2.923e+04     0.001    0.999
> purbank0.001173365  2.157e+01  2.067e+04     0.001    0.999
> purbank0.001466706  2.157e+01  2.923e+04     0.001    0.999
> purbank0.001760047  6.429e-16  2.000e+00  3.21e-16    1.000
> purbank0.002346729  2.157e+01  2.923e+04     0.001    0.999
> purbank0.003813435  2.157e+01  2.923e+04     0.001    0.999
> purbank0.004106776  2.157e+01  2.067e+04     0.001    0.999
> purbank0.004693458  2.157e+01  2.067e+04     0.001    0.999
> 
> 
> ****************************************
> Jeffrey A. Stratford, Ph.D.
> Postdoctoral Associate
> 331 Funchess Hall
> Department of Biological Sciences
> Auburn University
> Auburn, AL 36849
> 334-329-9198
> FAX 334-844-9234
> http://www.auburn.edu/~stratja
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT.         [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%