[BioC] Model Design

James W. MacDonald jmacdon at uw.edu
Tue Sep 2 16:19:04 CEST 2014


Hi Sander,

Please don't take conversations off-list (e.g., use 'Reply all' when
responding).


On Mon, Sep 1, 2014 at 10:59 AM, Sander van der Zeeuw <
s.a.j.van_der_zeeuw at lumc.nl> wrote:

>  Hi Jim,
>
> thanks for the information, i have been struggling for a few days now
> trying to implement your tips. If i want to compare the 8 samples divided
> into 4 subjects with no treatment or control or what so ever. I have only
> one model which produces a reasonable result. I just do not know what
> exactly is happening. This is the contrast and design model i have been
> using:
>

This is a problem to start with. I have no idea what 'i want to compare the
8 samples divided into 4 subjects with no treatment or control or what so
ever' means. You will need to be more clear on what your goals are for me
(or anybody else) to be able to help you.


contrast <- makeContrasts(subject1-subject2+subject3-subject4, levels =
> design);
>           Contrasts
> Levels     subject1 - subject2 + subject3 - subject4
>   subject1                                         1
>   subject2                                        -1
>   subject3                                         1
>   subject4                                        -1
>
>
This contrast is testing an interaction between subjects 1 and 2 versus
subjects 4 and 3. Without knowing what the subjects are, I cannot say if
this is a reasonable thing to be doing. But please note two things:

1.) Any comparison in ANOVA is just simple algebra, so if you have taken
algebra, you should be able to figure out what the comparison is.
2.) Given 1.), we can decompose your contrast and see what it is testing:

Your contrast is
subject1 - subject2 + subject3 - subject4

we can rewrite that as

(subject1 - subject2) - (subject4 - subject3)

so at a certain level, what you are doing is computing the difference
between subjects 1 and 2, and then comparing that to the difference between
4 and 3. If the value in the first parenthesis is pretty close to the
second, then you won't have much evidence for a difference. Alternatively,
if the values in the two parentheses are different (and larger than
expected given the intra-group variability), you will likely reject the
null hypothesis and say there appears to be a difference.

But what does this mean? In your case, I have no idea. In addition, I have
no idea if you are really (or should be) looking for an interaction here.
Which gets us back to my first point; what exactly (and I mean EXACTLY) are
you trying to do?

If you simply want to do all possible comparisons, then you just set it up
in the most obvious way:

contrast <- makeContrasts(subject1-subject2, subject1-subject3,
subject1-subject4, subject2-subject3, subject2-subject4, subject3-subject4,
levels = design)

Best,

Jim




>
> design <- model.matrix(~ 0 + subject, data = group );
> > design
>
>       subject1 subject2 subject3 subject4
> S7309        1        0        0        0
> S7310        1        0        0        0
> S7311        0        1        0        0
> S7312        0        1        0        0
> S7313        0        0        1        0
> S7314        0        0        1        0
> S7315        0        0        0        1
> S7316        0        0        0        1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$subject
> [1] "contr.treatment"
>
> Do you think that this will be the correct setup now? Because i doubt this
> strongly. If you have any interesting tips please give them to me.
>
> Best,
>
> regards,
>
> Sander
>
>
>
> On 08/21/2014 04:12 PM, James W. MacDonald wrote:
>
> Hi Sander,
>
> On Aug 21, 2014 8:59 AM, "Sander [guest]" <guest at bioconductor.org> wrote:
> >
> > Dear edgeR maintainers,
> >
> > I have some troubles setting up the correct design for my DE experiment.
> I now how my experiment design should look like and that is like this:
> > > design
> >       subject1 subject2 subject3 subject4
> > S7309        1        0        0        0
> > S7310        1        0        0        0
> > S7311        0        1        0        0
> > S7312        0        1        0        0
> > S7313        0        0        1        0
> > S7314        0        0        1        0
> > S7315        0        0        0        1
> > S7316        0        0        0        1
> > attr(,"assign")
> > [1] 1 1 1 1
> > attr(,"contrasts")
> > attr(,"contrasts")$subject
> > [1] "contr.treatment"
> >
> > After that i make my contrasts:
> > > makeContrasts(subject1,subject2,subject3,subject4, levels = design)
>
> This is where you have made a mistake. Toy should be subtracting one
> subject from another, depending on the contrasts you care about.
>
> The resulting contrasts matrix should have both positive and negative
> values, not just zeros and ones.
>
> See the edgeR users guide or limma users guide for examples.
>
> Best,
>
> Jim
>
> >           Contrasts
> > Levels     subject1 subject2 subject3 subject4
> >   subject1        1        0        0        0
> >   subject2        0        1        0        0
> >   subject3        0        0        1        0
> >   subject4        0        0        0        1
> >
> > This all looks right since i need to compare the subjects with each
> other. But when i run my analysis function which looks like this:
> >
> > library( edgeR );
> > library( ggplot2 );
> > library( reshape );
> > library( FactoMineR );
> >
> > analyse <- function( counts, design, contrast, name, style ) {
> >   counts <- counts[ rowSums( counts, na.rm = TRUE ) > 0, ];
> >   y <- DGEList( counts = counts, genes = rownames( counts ) );
> >   y <- calcNormFactors( y );
> >   y <- estimateGLMCommonDisp( y, design );
> >   y <- estimateGLMTrendedDisp( y, design, df = 5 );
> >   y <- estimateGLMTagwiseDisp( y, design );
> >
> >   fit <- glmFit( y, design );
> >   lrt <- glmLRT( fit, contrast = contrast );
> >   de <- decideTestsDGE( lrt, p = 0.05, adjust = "BH" );
> >   cpmY <- cpm( y );
> >
> >   daf <- designAsFactor( design );
> >   orderedDesign <- design[ order( daf, names( daf ) ), ];
> >   tab <- data.frame(
> >     row.names = rownames( cpmY ),
> >     genes = rownames( cpmY ),
> >     de = de,
> >     cpmY[ ,order( daf, names( daf ) ) ]
> >   );
> >
> >   aRepTab <- topTags( lrt, n = nrow( counts ) )$table;
> >   aRepTab$rank <- 1:nrow( counts );
> > #  repTab <- tab[ match( aRepTab$genes, rownames( tab ) ), ];
> >
> >   repTab <- merge( aRepTab, tab, by = "genes", sort = FALSE );
> >   repTab <- repTab[ order( repTab$rank ), ];
> > #  data.frame(
> > #    row.names = rownames( aRepTab ),
> > #    aRepTab,
> > #    tab[ match( aRepTab$genes, tab$genes ), ]
> > #  )
> >
> >   list(
> >     name = name,
> >     y = y,
> >     fit = fit,
> >     lrt = lrt,
> >     de = de,
> >     tab = tab,
> >     style = style,
> >     repTab = repTab,
> >     orderedDesign = orderedDesign
> >   );
> > }
> >
> > I got the following error:
> >
> > Error in mglmLevenberg(y, design = design, dispersion = dispersion,
> offset = offset,  :
> >   BLAS/LAPACK routine 'DGEMM ' gave error code -13
> > 5 mglmLevenberg(y, design = design, dispersion = dispersion, offset =
> offset,
> >     weights = weights, coef.start = start, maxit = 250)
> > 4 glmFit.default(glmfit$counts, design = design0, offset = glmfit$offset,
> >     weights = glmfit$weights, dispersion = glmfit$dispersion,
> >     prior.count = 0)
> > 3 glmFit(glmfit$counts, design = design0, offset = glmfit$offset,
> >     weights = glmfit$weights, dispersion = glmfit$dispersion,
> >     prior.count = 0)
> > 2 glmLRT(fit, contrast = contrast) at diffexpr.R#15
> > 1 analyse(counts, design, contrast, countsId, style)
> >
> > I tried to use different models but i cannot succeed to avoid the error
> for this comparison. (other comparisons do succeed) Any hints will be very
> appreciated.
> >
> > Thanks in advance!
> >
> > Best regards,
> >
> > Sander
> >
> >  -- output of sessionInfo():
> >
> > > sessionInfo()
> > R version 3.1.0 (2014-04-10)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  LC_MONETARY=en_US.UTF-8
> >  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
>               LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] splines   stats     graphics  grDevices utils     datasets  methods
>  base
> >
> > other attached packages:
> > [1] FactoMineR_1.26 reshape_0.8.5   ggplot2_1.0.0   reshape2_1.4
> edgeR_3.6.7     limma_3.20.8
> >
> > loaded via a namespace (and not attached):
> >  [1] car_2.0-20           cluster_1.15.2       colorspace_1.2-4
>  digest_0.6.4         grid_3.1.0           gtable_0.1.2
> >  [7] htmltools_0.2.4      lattice_0.20-29      leaps_2.9
> MASS_7.3-33          munsell_0.4.2        nnet_7.3-8
> > [13] plyr_1.8.1           proto_0.3-10         Rcpp_0.11.2
> rmarkdown_0.2.49     scales_0.2.4         scatterplot3d_0.3-35
> > [19] stringr_0.6.2        tools_3.1.0          yaml_2.1.13
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list