[R-sig-eco] Factors in partial RDA part 2

Andrew Halford andrew.halford at gmail.com
Thu Feb 9 05:58:16 CET 2017


Thanks Jari,

My education continues!

cheers

Andy

On 8 February 2017 at 17:32, Jari Oksanen <jari.oksanen at oulu.fi> wrote:

> I said earlier that you can use either dummy variables or factors with
> little difference. However, there are some cases where the differences
> becomes important: that is the case when you start playing with individual
> dummy variables like they were real variables. Don't do that! Do not use
> forward.sel with dummy variables: it makes no sense. Do not look at the VIF
> values of single factor levels: that rarely makes sense, and they can never
> suggest removing single levels of factors. It is the whole factor, in our
> out.  Never partly in, partly out. If you go to play like that, you really
> should switch to standard R way of defining your factor as a factor. The
> highish VIF values for some levels are probably triggered by correlations
> with some continuous variables in your model.
>
> There should be a generalized VIF in vegan that gives one statistic for
> the whole factor. Contributions are welcome at
> http://github.com/vegandevs/vegan.
>
> cheers, Jari Oksanen
> ________________________________________
> From: R-sig-ecology <r-sig-ecology-bounces at r-project.org> on behalf of
> Andrew Halford <andrew.halford at gmail.com>
> Sent: 08 February 2017 10:14
> To: r-sig-ecology at r-project.org
> Subject: [R-sig-eco] Factors in partial RDA part 2
>
> Hi Listers,
>
> Further to my last post I am seeking more insights into how to interpret
> effects of Factors in a RDA analysis.
>
> I think of a factor as a single variable whose influence on observed fish
> distribution patterns I would like to quantify, along with a bunch of other
> numerical variables.
>
> To do the analyses this Factor (called 'geom') is turned into a number of
> dummy variables (seven actually).
>
> The conceptual problem I am having is that when I do a call to vif.cca to
> check on collinearity for example, the output suggests I should remove some
> of the dummy variables making up the levels of my Factor. Doing this would
> leave me with only 3 of the dummy variables out of the original 8 to put
> into the RDA. I then don't see that I am actually testing the Factor 'geom'
> anymore but rather just individual variables representing a couple of the
> different levels of the original Factor. How do I proceed with this?
>
> The same conundrum for me is seen when I run the forward.sel command to
> look at the most efficient number of explanatory variables to have in the
> final model. The process selects only some of the dummy variables to
> include in the model. Again I struggle to see how I am testing or including
> the full effects of the Factor 'geom' if only a few of the dummy variables
> are actually included in the model.
>
> # here is the model run with all the potential explanatory variables
> ('geom' is the FACTOR with 7 levels)
>
> > fish.env <-
> rda(fish.h~coral_cover+macroalgae+turf_algae_sqrt+
> ccc_4thrt+rubble_sqrt+reef_slope_sqrt+rugosity
>
> +exposure+min_d_sqrt+d_range+chl_a_log+popn_density_4throot+fp+protection
>                 +geom,data=env.factor)
>
> # collinearity assessment - the results favour dropping 3 of the 'geo'
> dummy variables leaving only 2 for the model.
> > vif.cca(fish.env)
>
> coral_cover           macroalgae      turf_algae_sqrt
> ccc_4thrt          rubble_sqrt      reef_slope_sqrt
> 2.688656             3.099972             2.849219
> 1.771637             2.411291             2.418953
> rugosity             exposure           min_d_sqrt
> d_range            chl_a_log         popn_density_4throot
> 2.967752             3.433961             2.587696
> 2.643991             3.626107             4.571781
> fp                  protection           geomgeo_bl
> geomgeo_cbrc        geomgeo_isefr        geomgeo_isprc
> 4.059624             3.195210             4.329852
> 12.657270             9.570015            12.052385
> geomgeo_lefr         geomgeo_oefr
> 7.347812            17.090731
>
> # here I have kept all the 'geom' dummy variables to submit to forward
> selection and it only selects 3 of them, hence it doesnt feel that I am
> actually including a Factor 'geom' in the model but rather just a few
> individual dummy variables?
>
>
> > forward.sel(fish.h,env.dummy3,adjR2thresh=R2a.all_fish_env)
>
> Testing variable 1
> Testing variable 2
> Testing variable 3
> Testing variable 4
> Testing variable 5
> Testing variable 6
> Testing variable 7
> Testing variable 8
> Procedure stopped (alpha criteria): pvalue for variable 8 is 0.092000
> (superior to 0.050000)
>         variables order         R2     R2Cum   AdjR2Cum        F  pval
> 1        exposure     8 0.07086240 0.0708624 0.04925455 3.279475 0.001
> 2              fp    13 0.04756799 0.1184304 0.07645089 2.266248 0.002
> 3       geo_isefr    16 0.04571706 0.1641474 0.10298750 2.242500 0.003
> 4       chl_a_log    11 0.03812686 0.2022743 0.12250174 1.911778 0.009
> 5          geo_bl    17 0.03423972 0.2365140 0.13863121 1.749016 0.008
> 6       geo_isprc    21 0.03384005 0.2703541 0.15514682 1.762391 0.012
> 7 reef_slope_sqrt     6 0.03466752 0.3050216 0.17353919 1.845666 0.007
>
>
> Any advice appreciated
>
> cheers
>
> Andy
>
>
>
>
>
>
> --
> Andrew Halford Ph.D
> Research Scientist (Kimberley Marine Parks)
> Dept. Parks and Wildlife
> Western Australia
>
> Ph: +61 8 9219 9795
> Mobile: +61 (0) 468 419 473
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
Andrew Halford Ph.D
Research Scientist (Kimberley Marine Parks)
Dept. Parks and Wildlife
Western Australia

Ph: +61 8 9219 9795
Mobile: +61 (0) 468 419 473

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list