[R-sig-ME] Questions - the stepwise selection issue.

Mon Nov 8 02:22:28 CET 2010

Hello All,

Instead of using step-wise selection, I would suggest instead using
multimodel inference (Burnham & Anderson).  The technique avoids
having to choose one "right" model and, in my opinion, is a more
accurate method than traditional step-wise procedures.

Cheers!

Andrew

-- 
Andrew Kosydar, PhD
drewdogy at uw.edu
(206) 669-0505

On Sun, Nov 7, 2010 at 7:17 PM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
> The stepwise model reduction issue is an interesting one.
>
> My view is that:
> 1) One should always begin by looking at the t-statistics for
> the coeffs in the full model (assuming that this is a situation
> where they are more or less believable!).  If there is a clear
> division into those that are significant and those that are
> clearly not significant (p-value > 0.1, maybe), then drop
> those that are not significant.  Check what difference this
> makes to the residual SE, and to the coefficients (any large
> changes may matter if there is an interest in interpreting
> coefficients).  There are other issues to consider; are some
> variables of such scientific consequence that they should
> be retained regardless?
>
> 2) Backwise stepwise selection or (better) exhaustive
> subset selection if the problem is not too large are a resort
> of desperation if some variant of (1) fails to give a useful
> result.  The approach (1) is most likely to fail to give a useful
> result in cases where there is quite a large number of
> explanatory variables, exactly the situation where variable
> selection bias becomes a serious issue.  The function
> bestsetNoise() in the DAAG package is designed to make
> it easy to experiment with variable selection effects for
> data that are purely noise.  For getting realistic SEs, some
> alternatives are:
>  a) repeat the whole analysis, selection and all, with repeated
>  bootstrap samples;
>  b) get SEs by repeated bootstrap sampling from 'test' data.
>  c) simulate, with selection and all, from the fitted model.
> For both a) and c) one has to deal with getting somewhat
> different selections each time round (that is itself instructive).
>
> 3) If the approach (2) has been used, and there is interest
> in the interpretation of coefficients, check the coefficients
> against those from the full model.  Any large changes will
> ring a warning bell for attempts to interpret them.
>
> 4) One possibility, following stepwise or other selection,
> is that one or more p-values may be so small that they are
> very unlikely to be an artefact of the selection process.
> In general, a simulation will be required, in order to be
> sure.
>
> Somewhat casual approaches to the use of backward (or
> other) stepwise selection may be a holdover from hand
> calculator days, or times when computers grunted somewhat
> to handle even modest sized calculations.  If the inclusion and
> exclusion criteria are suitably chosen (but who really knows
> what is suitable?), I suspect that in some contexts they do
> more or less work to give believable answers, without undue
> selection bias.  But how, without checks such as I have noted,
> can one be sure?  This may be one of the murky dark alleys
> of statistical practice, where magic incantations and hope too
> often prevail over hard evidence.
>
> It would be useful to find a review paper that covers this
> ground systematically and incisively, without undue reliance
> on specific examples.
>
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
> On 08/11/2010, at 5:42 AM, Ben Bolker wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 10-11-07 11:14 AM, Shawn McCracken wrote:
>>> Shawn McCracken <smccracken at ...> writes:
>>>
>>>>
>>>> Dear Mixed-models group,
>>>>
>>>> I am working with a dataset containing fixed and nested random effects. I
>>>> have
>>>> one fixed effect that I am most interested in and the others were collected
>>>> to
>>>> see if they also might have an influence. I apologize for the novel but
>>>> hopefully discussion of this will help others in the future who are as
>>>> intimidated as I was/am.
>>>>
>>>> The data consist of total counts of anuran individuals from a particular
>>>> species
>>>> of epiphytic phytotelm plant found in tree canopies at two sites. The site
>>>> difference (if any) is my main interest. At each site all trees with
>>>> suitable
>>>> #¹s of this epiphyte species for sampling were located within a
>>>> predetermined
>>>> size area. 16 trees were randomly selected from those available at each site
>>>> and
>>>> 5 epiphytes were then randomly sampled for all anurans within them. So, 2
>>>> sites
>>>> -> 16 trees (at each site) -> 5 epiphytes (in each tree), which equals 80
>>>> samples from each site for a total of 160........................
>>>>     [[alternative HTML version deleted]]
>>>>
>>>>
>>>
>>> Update: The install problem with glmmADMB has been fixed on my Mac. Thanks to
>>> Dave. Details below. I could still use some feedback on what I have done so far
>>> still.
>>>
>>
>>  Update: I have been working on the glmmADMB package a bit.  The
>> current version on R-forge installs OK on my MacOS X.6 machine. It
>> contains 32-bit binaries which it automatically puts in the correct
>> location, so that you shouldn't have to mess around with doing this
>> stuff manually.  Dave F. has sent me compiled 64-bit OS X binaries, but
>> I haven't gotten around to incorporating them yet (the 32-bit binaries
>> do work on my system, although presumably the 64-bit ones would be
>> faster in general).
>>  So
>>  install.packages("glmmADMB",repos="http://r-forge.r-project.org")
>> should work on MacOS.
>>  It would be helpful to get reports of trouble from list members who
>> try it.
>>
>>  To follow up on some of your other questions with my own opinions:
>> * as I recommend on <http://glmm.wikidot.com/faq> (I have just added a
>> few words to make my personal opinions clearer), I would recommend
>> glmm.admb or glmer with individual-level random effects over the various
>> quasi- options.
>> * glmm.admb currently only works with a single random effect, so you
>> can't do nested random effects that way.  You could build a more
>> complete model in AD Model Builder, or revert to glmer.
>> * Your model specification
>>
>> m1po<-lmer(count~treat+treedbh+treehgt+numepi+elevepi+hgtepi+leafepi+
>> (1|tree/epi),family=poisson,data=ecpad2)
>>
>>  looks reasonable.  If you say
>> ecpad2$indiv <- 1:nrow(ecpad2)
>> and add +(1|indiv) to your model specification you will have an
>> individual-level random effect.
>>
>> * Is 'treat' your site variable?  In any case, if you are trying to do
>> a statistical comparison between only two sites you have a major
>> pseudo-replication problem (Hurlbert 1984).
>>
>>  * The p-values that you get from summary(lmer) are Wald Z statistics,
>> they assume large data sets and are possibly unreliable for
>> moderate-sized data sets ...
>>
>> * Opinions differ on the value of backward stepwise model reduction. It
>> is standard practice in many ecological contexts and is suggested for
>> moderate model complexity by many respected practicing
>> (eco)statisticians (Bates, Wood, Zuur ...) but is vehemently decried by
>> others (Harrell).  I would probably base inference on your full model
>> rather than doing backward elimination.
>>
>>
>>> The solution that worked for me:
>>>
>>> I used the binaries Dave sent in admbfiles.zip over in the post in the
>>> admb-users group: http://groups.google.com/group/admb-users/t/df5779586e45b9b
>>>
>>> First, I copied them to my desktop and unzipped.
>>> Opened Terminal and typed the following to direct it to run nbmm in the expanded
>>> folder and confirm it would run:
>>>
>>> ShawnMBP:$ /Users/Shawn/Desktop/admbfiles/nbmm     #of course you will need to
>>> change this to navigate to where it is on your computer#
>>> Error trying to open data input file /users/shawn/desktop/admbfiles/nbmm.dat
>>> Error trying to read in model data
>>> This is usual caused by a missing DAT file
>>>
>>> Dave said the error message comes from ndmm looking for the data file to use but
>>> it is running.
>>>
>>> I then located where glmmADMB had originally placed these same named files when
>>> I did the install of glmmADMB. I can’t remember how I found where they were and
>>> spotlight won’t show them either. I think I did a search just for “R” and found
>>> an R.framework folder in the Library folder. I looked through there and found
>>> them in a folder called admb here:
>>>
>>> MBP_SFM>Library>Frameworks>R.framework>Versions>2.11>Resources>library>glmmADMB>
>>> admb
>>>
>>> I then replaced the nbmm and bvprobit files that were there with the ones
>>> provided by Dave.
>>> Started up R, loaded glmmADMB, viewed the epil2 dataset, and then ran the
>>> example model and it ran fine!
>>>
>>> My system: Macbook Pro, Mac OSX 10.6.4, R 64-bit
>>>
>>> Hope this helps.
>>>
>>> Shawn
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAkzW8wMACgkQc5UpGjwzenN0DACeNN/OmJf0UK9hSOtTt8DmPcdB
>> vHwAmwXH6qKTOwT9snxDsDgldVm6hzVO
>> =4NDD
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>