[Bioc-sig-seq] edgeR plots for paired samples

Davis McCarthy dmccarthy at wehi.EDU.AU
Fri Nov 5 00:33:12 CET 2010


Dear Mayte

I have some ideas about what might be going awry here. Thank you for providing more commands and output, but from what you have provided some things are still ambiguous, so I will have to make a few guesses as to what is happening.

In brief, though, producing smear and mean-variance plots for paired or other complex experimental designs is problematic (theoretically and computationally) and the plotSmear() and plotMeanVar() functions in edgeR are not currently set up to handle paired data.

Some more comments below.


On Nov 5, 2010, at 5:30 AM, Mayte Suarez-Farinas wrote:

> Dear Gordon:
> 
> you were right, but I was not wrong.
> (I did the analysis for common and tagwise dispersion estimates.
> the tagwise ends in tgw.
> Anyway here is the code
> 
> 
> PScounts <-readDGE(files.pheno)

I cannot know what is in files.pheno, but if it is not a targets file such as recommended in the User's Guide, then PScounts will be a DGEList without an experimental group defined (i.e. PScounts$samples$group is NULL). 

It would be useful here to see the output from PScounts$samples so we have some idea about the design of the experiment and what is and is not defined.

> d.PS <- calcNormFactors(PScounts)
> dr.PS <- d.PS[rowSums(d.PS$counts) > 9, ]
> dr.PS.tgw <- estimateCRDisp(dr.PS, design, tagwise = TRUE, prior.n = 10)
> glmfit.PS.tgw <- glmFit(dr.PS.tgw, design, dispersion = dr.PS.tgw 
> $CR.tagwise.dispersion)
> lrt.PS.tgw <- glmLRT(dr.PS.tgw, glmfit.PS.tgw)
> 
> 
> plotSmear(d.PS, panel.first=grid(), smooth.scatter=FALSE)
> Error: length(pair) == 2 is not TRUE

Following from the output above, we have d.PS$samples$group is NULL. plotSmear() tries to produce an MA-plot using two groups defined by d.PS$samples$group and since this is NULL, it produces this (admittedly noninformative) error. I will add a check to make this clearer.

plotSmear() is designed to deal with an experiment with multiple groups in a one-way layout, so is not really appropriate for paired data.


> 
>> str(d.PS)
> Formal class 'DGEList' [package "edgeR"] with 1 slots
>   ..@ .Data:List of 2
>   .. ..$ :'data.frame':	6 obs. of  5 variables:
>   .. .. ..$ files       : Factor w/ 6 levels "LS252.counts.txt",..: 1  
> 2 3 4 5 6
>   .. .. ..$ Tissue      : Factor w/ 2 levels "NL","LS": 2 2 2 1 1 1
>   .. .. ..$ Patient     : Factor w/ 3 levels "25","28","29": 1 2 3 1  
> 2 3
>   .. .. ..$ lib.size    : num [1:6] 23067191 20684675 19881245  
> 19665929 22938039 ...
>   .. .. ..$ norm.factors: num [1:6] 0.95 0.991 1.033 0.954 1.071 ...
>   .. ..$ : num [1:34487, 1:6] 0 0 2 0 100 0 0 4 0 35 ...
>   .. .. ..- attr(*, "dimnames")=List of 2
>   .. .. .. ..$ : chr [1:34487] "NM_000015" "NM_000016" "NM_000017"  
> "NM_000018" ...
>   .. .. .. ..$ : chr [1:6] "LS-25" "LS-28" "LS-29" "NL-25" ...
> 
> 
> plotMeanVar(dr.PS.tgw, meanvar=meanvar, show.tagwise.vars=TRUE,  
> NBline=TRUE, dispersion.method="coxreid")

Unfortunately, I can't see the meanvar object defined anywhere, so it is difficult to say what causes this error. All I can tell from the error message is that  meanvar$means is different in length to dr.PS.tgw$CR.tagwise.dispersion. This could arise if you created the meanvar object from a DGEList before filtering, then did your analysis on a filtered object, e.g.:
meanvar <- plotMeanVar(d.unfiltered)
d.filtered <- d.unfiltered[ rowSums(d.unfiltered) > 9, ]
plotMeanVar(d.filtered, meanvar, ... )

Beyond that, it is difficult to say.

However, it is problematic to use plotMeanVar for a paired design. The function computes pooled sample variances across groups - if these are undefined, or cannot be defined appropriately for the experimental design, then it is not meaningful to produce such a mean-variance plot.

In looking at paired data myself (6 samples from three individuals, one cancer and one normal sample from each patient) I have found it interesting to define a group variable (i.e d$samples$group) as "cancer" or "normal" for each sample and produce a mean-variance plot using those defined groups. However, this ignores the paired nature of the data, and any such plot should be interpreted with caution. I have found such plots useful, but I hesitate to recommend that approach generally. The same caveats would apply to taking this approach to producing an MA-plot (smear plot) for paired data.

In summary, all of these visualizations that work very nicely for one-way experimental layouts are much more challenging to produce for general experimental designs, although we are thinking about ways in which to do it appropriately.

Hope that helps with your analysis.

Best regards
Davis



> Error in xy.coords(x, y, xlabel, ylabel, log) :
>   'x' and 'y' lengths differ
> In addition: Warning messages:
> 1: In meanvar$means^2 * tagwise.dispersion :
>   longer object length is not a multiple of shorter object length
> 2: In meanvar$means + meanvar$means^2 * tagwise.dispersion :
>   longer object length is not a multiple of shorter object length
>> str(dr.PS.tgw)
> Formal class 'DGEList' [package "edgeR"] with 1 slots
>   ..@ .Data:List of 5
>   .. ..$ :'data.frame':	6 obs. of  5 variables:
>   .. .. ..$ files       : Factor w/ 6 levels "LS252.counts.txt",..: 1  
> 2 3 4 5 6
>   .. .. ..$ Tissue      : Factor w/ 2 levels "NL","LS": 2 2 2 1 1 1
>   .. .. ..$ Patient     : Factor w/ 3 levels "25","28","29": 1 2 3 1  
> 2 3
>   .. .. ..$ lib.size    : num [1:6] 23067191 20684675 19881245  
> 19665929 22938039 ...
>   .. .. ..$ norm.factors: num [1:6] 0.95 0.991 1.033 0.954 1.071 ...
>   .. ..$ : num [1:12373, 1:6] 2 100 4 35 11 ...
>   .. .. ..- attr(*, "dimnames")=List of 2
>   .. .. .. ..$ : chr [1:12373] "NM_000017" "NM_000019" "NM_000022"  
> "NM_000024" ...
>   .. .. .. ..$ : chr [1:6] "LS-25" "LS-28" "LS-29" "NL-25" ...
>   .. ..$ : num [1:6, 1:4] 1 1 1 1 1 1 0 1 0 0 ...
>   .. .. ..- attr(*, "dimnames")=List of 2
>   .. .. .. ..$ : chr [1:6] "1" "2" "3" "4" ...
>   .. .. .. ..$ : chr [1:4] "(Intercept)" "Patient28" "Patient29"  
> "TissueLS"
>   .. .. ..- attr(*, "assign")= int [1:4] 0 1 1 2
>   .. .. ..- attr(*, "contrasts")=List of 2
>   .. .. .. ..$ Patient: chr "contr.treatment"
>   .. .. .. ..$ Tissue : chr "contr.treatment"
>   .. ..$ : num 0.0374
>   .. ..$ : num [1:12373] 0.0369 0.0303 0.036 0.0301 0.0365 ...
>> 
> 
> Mayte Suarez-Farinas
> Research Associate, The Rockefeller University
> Biostatistician, The Rockefeller University Hospital
> 1230 York Ave, Box 178,
> New York, NY, 10065
> +1(212) 327-8213
> 
> 
> 
> 
> 
> On Nov 3, 2010, at 7:29 PM, Gordon K Smyth wrote:
> 
>> Dear Mayte,
>> 
>>> Date: Wed, 3 Nov 2010 10:56:42 -0400
>>> From: Mayte Suarez-Farinas <farinam at mail.rockefeller.edu>
>>> To: bioc-sig-sequencing at r-project.org
>>> Subject: [Bioc-sig-seq] edgeR plots for paired samples
>>> Message-ID: <C31BCD6A-A0E8-4189-926B-1760F63D64FC at rockefeller.edu>
>>> Content-Type: text/plain
>>> 
>>> Dear Bioc
>>> 
>>> I am using edgeR for paired samples and I am having difficulties with
>>> both plotSmear and plotMeanVar functions. I dont know if those  
>>> functions
>>> simply do not work on those cases
>>> 
>>> PScounts <-readDGE(files.pheno)
>>> d.PS <- calcNormFactors(PScounts)
>>> dr.PS <- d.PS[rowSums(d.PS$counts) > 9, ]
>>> dr.PS.c <- estimateCRDisp(dr.PS, design)
>>> glmfit.PS.c <- glmFit(dr.PS.c, design, dispersion = dr.PS.c
>>> $CR.common.dispersion)
>>> lrt.PS.c <- glmLRT(dr.PS.c, glmfit.PS.c)
>>> 
>>> plotMeanVar(dr.PS.tgw, meanvar=meanvar, show.tagwise.vars=TRUE,
>>> NBline=TRUE, dispersion.method="coxreid")
>>> Error in xy.coords(x, y, xlabel, ylabel, log) :
>>> 'x' and 'y' lengths differ
>>> In addition: Warning messages:
>>> 1: In meanvar$means^2 * tagwise.dispersion :
>>> longer object length is not a multiple of shorter object length
>>> 2: In meanvar$means + meanvar$means^2 * tagwise.dispersion :
>>> longer object length is not a multiple of shorter object length
>> 
>> The object dr.PS.tgw that you pass to plotMeanVar() doesn't appear  
>> in any
>> of the previous code that you give.  Same with meanvar.  Is it  
>> possible
>> that you've simply used the wrong object name?  It would be normal  
>> to pass
>> to plotMeanVar your data object such as d.PS or dr.PS.
>> 
>> Best wishes
>> Gordon
>> 
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>> 
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> other attached packages:
>>> [1] limma_3.6.0          PGSEA_1.20.0         hgu133plus2.db_2.4.5
>>> org.Hs.eg.db_2.4.6   RSQLite_0.9-2
>>> [6] DBI_0.2-5            AnnotationDbi_1.12.0 Biobase_2.10.0
>>> MASS_7.3-8           biomaRt_2.6.0
>>> [11] edgeR_1.8.0
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.28.0           affyio_1.18.0
>>> annotate_1.28.0       Biostrings_2.18.0     gcrma_2.22.0
>>> [6] genefilter_1.32.0     IRanges_1.8.0
>>> preprocessCore_1.12.0 RCurl_1.4-3           simpleaffy_2.26.0
>>> [11] splines_2.12.0        survival_2.35-8       tools_2.12.0
>>> XML_3.2-0             xtable_1.5-6
>>>> 
>>> Mayte Suarez-Farinas
>>> Research Associate, The Rockefeller University
>>> Biostatistician, The Rockefeller University Hospital
>>> 1230 York Ave, Box 178,
>>> New York, NY, 10065
>>> +1(212) 327-8213
>> 
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:10}}
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

------------------------------------------------------------------------
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy at wehi.edu.au
http://www.wehi.edu.au



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioc-sig-sequencing mailing list