[BioC] Outlier detection in DEseq
Wolfgang Huber
whuber at embl.de
Wed Oct 20 21:57:04 CEST 2010
Use "Array Quality Metrics":
Hi Laurie,
To follow up on Simon's suggestion, after variance stabilising
transformation of the counts (this transformation is logarithm-like for
high counts and square-root-like for low counts), it should be possible
and instructive to call the 'arrayQualityMetrics' function from the
package of the same name on the data matrix. To do this, it is probably
easiest to put the transformed data (and the samples metadata) into an
ExpressionSet.
At some point, somebody will hopefully write a more specialised quality
metrics functionality for this application
Best wishes
Wolfgang.
Il Oct/20/10 11:37 AM, Simon Anders ha scritto:
> Hi Laurie
>
> On 10/20/2010 07:25 AM, Rui Luo wrote:
>> I have a question regarding to DEseq differential expression analysis.
>> In DEseq, is there any way to detect whether the library from one sample
>> is totally screwed up?
>> Or for signal gene, the expression is abnormal in one sample (For this
>> situation, do we just abandon this value or modify it)?
>
> if you have enough replicates, you can detect an outlier sample from the
> fact that it is markedly different from the rest.
>
> Possible ways to do so:
>
> - Make a heatmap of the samples after performing a variance stabilizing
> transformation on the count data. This is decribed in the DESeq
> vignette. The heatmap shows you how "different" each sample is from each
> other samples, and if one sample is very different from its replicates,
> you may want to consider excluding it from analysis.
>
> - Make for each sample an MA plot comparingin it to the "fictive
> reference" that I describes in my reply to your other question, as follows
>
> library(DESeq)
>
> # get an example count data set -- or use your data:
> cds <- makeExampleCountDataSet()
>
> # estimate the size factors:
> cds <- estimateSizeFactors( cds )
>
> # calculate the gene-wise geometric means
> geomeans <- exp( rowMeans( log( counts(cds) ) ) )
>
> # choose the sample we ant to check
> j <- 1
>
> # plot the log fold change versus the reference against
> # the geometric mean
> plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" )
>
> # Mark the size factor (0 log fold change):
> abline( h = sizeFactors(cds)[j] )
>
> An odd sample should stick out by looking different. You could also take
> the geometric mean not over all samples but only over replicate samples,
> or you could simply plot two samples against each other.
>
>
> Remember that there are also what we call "variance outliers", i.e.,
> single genes who vary much more across replicates than the variance fit
> would suggest. The vignette tells you how to recognize them.
>
>
> Simon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list