[R] Design matrix for species mixture
Jim Lemon
drjimlemon at gmail.com
Fri May 12 11:18:35 CEST 2017
Hi Margot,
Very messy, like nature. One way is to do tests with dummy variables
that compare:
beans+anything vs anything without beans
maize+anything vs anything without maize
pumpkin+anything vs anything without pumpkin
Then if you find that the "beans" comparison has the strongest effect,
perhaps because beans are nitrogen fixers, you could do successive
comparisons of the bean +anything mixtures. It really depends upon
what your a priori hypothesis is. You'll need a bit of statistical
power to get reliable, or any, results.
Jim
On Fri, May 12, 2017 at 6:20 PM, Margot Neyret <margotneyret at gmail.com> wrote:
> Hi Jim,
>
> Sorry if my question was not clear. I will try to explain again…
>
> I have one response variable Y, let’s say vegetation cover. Then I have my
> explanatory variable, let’s call it Crop. In my field I can have either
> Maize (m), Bean (b), Pumpkin (p) or mixtures : m+b, m+p, b+p, m+b+p. I also
> have a second explanatory variable X (e.g. soil moisture content).
>
> So for now my variable Crop has 7 levels [m, p, b, mp, mb, pb, mpb]. If I
> want to compare Y between these crops, I write :
> summary(lm(Y~Crop))
> TukeyHSD(aov(Y~Crop))
>
> And with X :
> summary(lm(Y~Crop*X))
> … etc.
>
>
> Then I want to compare the effect of each individual crop. I can do as you
> suggested Jim : 3 variables M, B, P with values 0 or 1 if the crop is
> present, 0 otherwise and add interactions.
> summary(lm(Y~M+B+P+M*P+…))
>
> But here come my questions. This model seems to have 2 drawbacks.
> 1. How can I do pairwise comparisons here as I would do with Turkey test ?
> How can test the hypothesis, for instance, “Bean provides higher cover than
> Maize whatever the mixture is” ?
> 2. When it comes to interactions with other variables it gets quite
> complicated (also note that in real life I have 5 crops, not 3 and other
> explanatory variables) :
> lm(Y~M*X+ B*X + P*X + M*B + M*P + P*B)
>
> So, isn’t there a way to make it more concise ?
>
> I hope it makes sens.
>
> Thanks,
> Margot
>
>
> On Friday, May 12, 2017 at 2:53 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Margot,
> I'm not sure I understand your model, but if I make up some data in
> which the response variable is vegetation cover and the three species
> are:
>
> A - eats one type of plant
> B - eats another type of plant
> C - preys on herbivorous insects
>
> df<-read.table(text="field,propveg,A,B,C
> 1,1,0,0,1
> 2,0.3,1,1,1
> 3,0.6,0,1,1
> 4,0.2,1,1,1
> 5,0.7,1,0,1
> 6,0.8,0,0,0
> 7,0.3,1,0,0
> 8,0.4,0,1,0
> 9,0.1,1,1,0
> 10,0.5,0,1,0
> 11,0.5,1,0,1
> 12,0.1,1,1,0
> 13,0.6,0,1,1
> 14,0,1,1,0",
> sep=",",header=TRUE)
> print(summary(lm(propveg~A+B+C+A:B+A:C+B:C,df)))
>
> Is that something like what you want?
>
> Jim
>
> On Fri, May 12, 2017 at 12:40 AM, Margot Neyret <margotneyret at gmail.com>
> wrote:
>
> Hello,
>
> I have fields with species mixtures (for instance, species a, b, c, a+b,
> a+c, b+c), and I look at the effect of each species on a response Y. More
> specifically, I would like to compare the effect of individual species,
> either alone or in mixture.
>
> Y = rnorm(18,0,1)
> mixture= rep(c('a','b', 'c', 'a+b', 'a+c', 'b+c'), each = 3)
>
>
> Thus I create variables A, B and C with :
> - A = 1 when the mixture contains a (ie mixture = a or a+b or a+c); and 0
> otherwise.
> - Idem for variables C and B.
>
> A = ifelse(mixture %in% c('a', 'a+b', 'a+c'), 1, 0)
> B = ifelse(mixture %in% c('b', 'a+b', 'b+c'), 1, 0)
> C = ifelse(mixture %in% c('c', 'a+c', 'b+c'), 1, 0)
>
>
> My plan was to build a design matrix from these 3 variables, that would then
> allow me to compare the effects of each species.
>
> mm = model.matrix(~A+B+C+0)
> summary(lm(Y~mm))
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.8301 0.6221 -1.334 0.203
> mmA 1.1636 0.4819 2.415 0.030 *
> mmB 0.8452 0.4819 1.754 0.101
> mmC -0.1005 0.4819 -0.208 0.838
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.8347 on 14 degrees of freedom
> Multiple R-squared: 0.4181, Adjusted R-squared: 0.2934
> F-statistic: 3.353 on 3 and 14 DF, p-value: 0.04964
>
> My questions :
> 1. Does this approach make any sense ? I have a feeling I am doing something
> strange but I cannot put my finger on it.
> 1. My ddl are wrong, I should not have an intercept here, or at least my
> intercept should be one of my species. Should I just remove one species form
> the design matrix ?
> 2. Is there any way to do post-hoc tests on my species now, as I would have
> done with Tukey test or lsmeans ?
>
> My objective afterwards is to add other explanatory variables and
> interactions in the model.
>
> Thanks in advance !
>
> M. N.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list