[R] AIC in MuMIn

Gavin Simpson gavin.simpson at ucl.ac.uk
Fri Aug 20 00:52:55 CEST 2010


On Fri, 2010-08-20 at 06:31 +0800, elaine kuo wrote:
<snip />
>         
>         
>         Oh, actually, I suppose you could automate this, so it will
>         return all
>         models with single variable:
>         
>         dd <- dredge(lm1)
>         parms <- !is.na(dd[, -c(1, (ncol(dd) - c(0:7)))])
>         want <- which(rowSums(parms) == 1)
>         dd[want, ]
>         
>         Having said all this, I don't think this is a good way to do
>         model
>         selection.
>         
>         =>
> 
> 
> Thanks for the time-efficient formula.
> Actually it is over my current ability to automate it, but it did save
> a lot of time already.

You don't need to automate it, I did it for you. The four lines above
show you only the lines from dd (your dredge result) that contain only
one covariate. That was what you asked for, wasn't it?

> Moreover, please kindly explain the command beginning with parms.
> data (Cement) was checked but could not ensure why the row and the
> column were set to be 1 and (ncol(dd) - c(0:7)).

OK; for this, you need to first get that dredge returns a data frame
with some additional attributes. (It actually inherits from class
"data.frame" so whilst it isn't exactly a data frame it works like one.)
So we can subset it as we would any other R data frame.

If you look at the object returned from dredge, you see the first column
refers to the intercept, the next N columns are for your covariates in
the full model you fitted, with the remaining columns representing the
'rank' statistic (AIC, BIC etc) plus some other measures.

> head(dd)
Global model: lm(formula = y ~ ., data = Cement)
---
Model selection table 
    (Int)      X    X1     X2    X3      X4 k   R.sq Adj.R.sq   RSS   AIC  AICc
11  52.58        1.468 0.6623               4 0.9787   0.9744 57.90 64.31 69.31
24  71.65        1.452 0.4161       -0.2365 5 0.9823   0.9764 47.97 63.87 72.44
23  48.19        1.696 0.6569  0.25         5 0.9823   0.9764 48.11 63.90 72.48
13 103.10        1.440              -0.6140 4 0.9725   0.9670 74.76 67.63 72.63
25 111.70        1.052        -0.41 -0.6428 5 0.9813   0.9750 50.84 64.62 73.19
17  52.53 0.1551 1.467 0.6410               5 0.9798   0.9731 54.86 65.61 74.18
   delta weight
11 0.000  0.544
24 3.125  0.114
23 3.163  0.112
13 3.322  0.103
25 3.879  0.078
17 4.869  0.048

So we select out only the bits we want:

dd[, -c(1, (ncol(dd) - c(0:7)))]

This bit is selecting columns by dropping the first and the last eight
columns. ncol(dd) gives the number of columns, ncol(dd) - 0 is the last
column, ncol(dd) - 1 is the next to the last column and so on. There are
eight of these columns we don't want. If you just run

> c(1, (ncol(dd) - c(0:7)))
[1]  1 14 13 12 11 10  9  8  7

you'll see that we have the column integers we want to drop, and then we
negate it to get the subsetting to drop rather than retain these
columns:

-c(1, (ncol(dd) - c(0:7)))

Having dropped the superfluous columns we aren't interested in, we are
left with an 'array' with one column per covariate (predictor variable).
If a given covariate is not in the model, it gets an NA in its column
for that model. The is.na() function returns an object of the same
dimension as dd[, -c(1, (ncol(dd) - c(0:7)))] but which indicates if
variable is NA or not:

> is.na(dd[, -c(1, (ncol(dd) - c(0:7)))])
       X    X1    X2    X3    X4
11  TRUE FALSE FALSE  TRUE  TRUE
24  TRUE FALSE FALSE  TRUE FALSE
23  TRUE FALSE FALSE FALSE  TRUE
13  TRUE FALSE  TRUE  TRUE FALSE
25  TRUE FALSE  TRUE FALSE FALSE
17 FALSE FALSE FALSE  TRUE  TRUE
19 FALSE FALSE  TRUE  TRUE FALSE
....

The ! negates this turning TRUE into FALSE and vice versa. In computing
and R, TRUE == 1, FALSE == 0, so when we do rowSums on this, we get
information regarding how many covariates are in each model:

> rowSums(parms)
11 24 23 13 25 17 19 26 31 27 28 29 30 16 22 32 14 20  6  4  7 15 10  9  8  3 
 2  3  3  2  3  3  3  3  4  4  4  4  4  2  3  5  2  3  1  1  2  2  2  2  2  1 
12 18 21  5  2  1 
 2  3  3  1  1  0

You asked for only those models containing a single covariate, so they
are the models with rowSum == 1. The which() function tells us which
models (rows of dd) these are and we save this indicator in 'want'

> which(rowSums(parms) == 1)
 6  4  3  5  2 
19 20 26 30 31

The final step is to select out rows 19, 20, 26, 30 and 31 (want) from
dd as these are the models you said you were interested in.

> dd[want, ]
Global model: lm(formula = y ~ ., data = Cement)
---
Model selection table 
   (Int)     X    X1     X2     X3      X4 k   R.sq Adj.R.sq    RSS    AIC
6 117.60                           -0.7382 3 0.6745   0.6450  883.9  97.74
4  57.42             0.7891                3 0.6663   0.6359  906.3  98.07
3  81.48       1.869                       3 0.5339   0.4916 1266.0 102.40
5 110.20                    -1.256         3 0.2859   0.2210 1939.0 108.00
2  82.31 1.874                             3 0.2353   0.1657 2077.0 108.80
   AICc delta weight
6 100.4 31.10  0.511
4 100.7 31.42  0.434
3 105.1 35.77  0.050
5 110.6 41.31  0.003
2 111.5 42.20  0.002

HTH

G

> Elaine

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list