[BioC] Log transformation and left censoring
Wolfgang Huber
whuber at embl.de
Sun Feb 3 12:24:33 CET 2013
Hi Paul
given your description, one possibility to explore might be a variance stabilising transformation.
E.g. DESeq provides one that smoothly interpolates between the square-root function for low counts and the log-transformation for higher counts, see Section 6 (and 7) of the vignette.
Best wishes
Wolfgang
Il giorno Jan 31, 2013, alle ore 8:57 AM, Paul Harrison <Paul.Harrison at monash.edu> ha scritto:
> Hello,
>
> We have been using voom and limma for some time now, and while we're
> fairly happy with it, it seems to produce significance levels that are
> on the conservative side. We also use edgeR to produce more optimistic
> results, but don't entirely trust the significance levels that it
> reports. I am looking for something in-between these extremes, and
> want to run an idea past this list as a sanity check. I would
> especially value Gordon and Charity's comments if they have time.
>
> The voom log transformation is essentially:
>
> log2( (count+0.5) / library.size )
>
> It then does some clever things with weights. What I'm considering instead is
>
> log2( count / library.size + moderation.amount / mean.library.size )
>
> where moderation.amount is much larger then 0.5, say 5. A couple of things here:
>
> - Instead of down-weighting low counts, I'm trying to get rid of the
> extra variation from low counts by artificially left censoring the
> data.
>
> - I'm using the mean of the libaray sizes because I want the left
> censor to be in the same place for each sample even if the library
> sizes are different, so that if a gene is entirely switched off in one
> condition it won't look variable just because there is a different
> left censor in each sample.
>
> I'm also using this transformation to create heatmaps.
>
> This seems to be working with the data set I am working with, I get
> more significant results and they seem reasonable by eye. It seems to
> me that even if this approach isn't ideal it should at least be safe,
> at worst it will cause limma to reduce the df.prior and produce less
> significant results. Anything I've missed?
>
> --
> Paul Harrison
>
> Victorian Bioinformatics Consortium / Monash University
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list