[BioC] phyloseq/DESeq gives negative transformed values

Michael Love michaelisaiahlove at gmail.com
Tue May 6 02:16:53 CEST 2014


hi Sophie,

On Mon, May 5, 2014 at 5:24 PM, Sophie Josephine Weiss
<Sophie.Weiss at colorado.edu> wrote:
> Makes sense, thanks for your help.  In the DESeq manual, it looks like all
> we need to do for e.g. clustering, or pcoa, is the estimateSizeFactors.  Is
> this correct?
>
> Or would it be also ok to use the values from estimateDispersions with the
> negatives set to zero or constant shifted?  I would do the above, but it
> looks like McMurdie et al. do both for their clustering simulation - so
> thought I would ask.

I don't follow your question. If you want to do clustering or PCA
using DESeq2, I assume you are applying one of the two transformations
we have implemented, as described in the vignette. If size factors are
not already estimated, both transformations will estimate them
internally, likewise for dispersions.

These transformations return objects with matrices in the assay slot
which are appropriate for calculating distances or PCA.  We show
demonstrations of both calculating distance and PCA in the vignette.

Please look the vignette over again, as this is the recommended usage.
These transformed values have been corrected for size factor.

You should not use any DESeq2 functions on the matrix of transformed
values. The transformation is the last step within DESeq2, then we
assume the user is doing something "downstream" with these values.

Mike

>
> Thanks again!
> Sophie
>
>
> On Thu, Apr 24, 2014 at 6:54 AM, Wolfgang Huber <whuber at embl.de> wrote:
>>
>> Hi Sophie
>>
>> as this issue comes up periodically, let me point out that
>>
>>    log (cx)  =  log(c) + log(x)
>>
>> That means, if you think of ‘x’ as your data matrix and ‘c’ as a single
>> positive number, you can always add or subtract a constant to your
>> transformed data, for instance, to make it more agreeable to you by having
>> all positive signs, and all that amounts to is an overall scaling
>> (multiplication) of the data on the untransformed scale.
>> An analogous idea applies to the rlog or vst transformations of DESeq2.
>>
>> A reasonable distance metric between samples or genes should probably not
>> depend on such an overall constant c.
>>
>> Best wishes
>>         Wolfgang
>>
>>
>>
>>
>>
>>
>> On 23 Apr 2014, at 23:44, Sophie Josephine Weiss
>> <Sophie.Weiss at colorado.edu> wrote:
>>
>> > Thanks Michael,
>> > The entire dataset (attached code and .biom) is negatives - there was an
>> > error of "out of vertex space" as described
>> > here<http://seqanswers.com/forums/showthread.php?p=18620>,
>> > so I tried setting maxk=300 as suggested.
>> > Commands are below.
>> > Thanks again!
>> > Sophie
>> >
>> > source("http://bioconductor.org/biocLite.R")
>> > biocLite("phyloseq")
>> > biocLite("DESeq")
>> >
>> > library("phyloseq")
>> > library("DESeq")
>> > library("biom")
>> >
>> > file = "~/Downloads/study_449_closed_reference_otu_table.biom"
>> > x = import_biom(file)
>> > source("~/Downloads/deseq_varstab.R")
>> > DESeq_data = deseq_varstab(x, method = "blind", sharingMode = "maximum",
>> > fitType = "local", locfit_extra_args=list(maxk=300))
>> > write_biom(make_biom(DESeq_data at otu_table
>> > ),"~/Desktop/449_Costello_DESeq.biom.tsv")
>> >
>> >
>> > On Sat, Apr 19, 2014 at 11:29 AM, Michael Love
>> > <michaelisaiahlove at gmail.com>wrote:
>> >
>> >> hi Sophie,
>> >>
>> >> You are getting negative values from the transformation for the
>> >> reasons I mentioned earlier, the transformation is log2-like.
>> >>
>> >> If you want to do something downstream of our software which requires
>> >> non-negative values, below is some example code of how to threshold
>> >> negative values for a matrix in R.
>> >>
>> >> The question of what is the best distance to use for taxa counts, or
>> >> whether ANOVA on variance stabilized data is a good idea for taxa
>> >> counts, depends on the properties of the data, and this is an area of
>> >> active research. As I don't have experience analyzing this kind of
>> >> data, I don't want to make any guesses.
>> >>
>> >>> m <- matrix(-2:5, ncol=2)
>> >>> m
>> >>     [,1] [,2]
>> >> [1,]   -2    2
>> >> [2,]   -1    3
>> >> [3,]    0    4
>> >> [4,]    1    5
>> >>> m[m < 0] <- 0
>> >>> m
>> >>     [,1] [,2]
>> >> [1,]    0    2
>> >> [2,]    0    3
>> >> [3,]    0    4
>> >> [4,]    1    5
>> >>
>> >> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss
>> >> <Sophie.Weiss at colorado.edu> wrote:
>> >>> Hi Mike,
>> >>> Could you please check whether I am running this correctly?  I have
>> >> double
>> >>> checked all the parameters, but for some reason, I am getting
>> >>> negatives
>> >>> using the R script on the attached .biom dataset.  There are no
>> >> replicates
>> >>> in this microbial dataset.
>> >>> Thanks for your advice,
>> >>> Sophie
>> >>>
>> >>>
>> >>> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss
>> >>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>
>> >>>> Thanks Mike, that is what I thought.  What if we wanted to perform
>> >> kruskal
>> >>>> wallis, or is it possible to perform anova on the variance-stabilized
>> >>>> matrix?
>> >>>>
>> >>>>
>> >>>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love
>> >>>> <michaelisaiahlove at gmail.com> wrote:
>> >>>>>
>> >>>>> hi Sophie,
>> >>>>>
>> >>>>> We recommend using the standard DESeq() function for differential
>> >>>>> expression.
>> >>>>>
>> >>>>> This is mentioned in the first line of the vignette section on
>> >>>>> transformations:
>> >>>>>
>> >>>>> "In order to test for diff erential expression, we operate on raw
>> >>>>> counts and use discrete distributions as
>> >>>>> described in the previous section"
>> >>>>>
>> >>>>> Also, in the McMurdie and Holmes, they are using the DESeq()
>> >>>>> function,
>> >>>>> as shown in their supplemental material:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>
>> >> http://joey711.github.io/waste-not-supplemental/simulation-differential-abundance/simulation-differential-abundance-server.html
>> >>>>>
>> >>>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss
>> >>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>> Please help with this?  Thanks again.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss
>> >>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>>>
>> >>>>>>> Thanks again Mike - would it be ok to do chi-2 and other
>> >> significance
>> >>>>>>> tests on the DESeq transformed datasets using independent code, or
>> >> is
>> >>>>>>> it
>> >>>>>>> necessary to do the differential expression tests strictly within
>> >>>>>>> DESeq2?
>> >>>>>>>
>> >>>>>>> Sophie
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love
>> >>>>>>> <michaelisaiahlove at gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>> hi Sophie,
>> >>>>>>>>
>> >>>>>>>> The VST code is the same in DESeq and DESeq2. The estimation of
>> >>>>>>>> dispersion is slightly different (details are in the vignette
>> >>>>>>>> "Changes
>> >>>>>>>> from DESeq to DESeq2"), but the fitted line (which is used by the
>> >>>>>>>> VST)
>> >>>>>>>> should be very similar.
>> >>>>>>>>
>> >>>>>>>> Mike
>> >>>>>>>>
>> >>>>>>>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss
>> >>>>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>>>>> Hi Mike,
>> >>>>>>>>> The McMurdie and Holmes paper uses DESeq for matrix
>> >> normalization -
>> >>>>>>>>> do
>> >>>>>>>>> you
>> >>>>>>>>> think that is ok, or would it be better to use DESeq 2?
>> >>>>>>>>> Thanks again,
>> >>>>>>>>> Sophie
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Apr 14, 2014 at 3:40 PM, Michael Love
>> >>>>>>>>> <michaelisaiahlove at gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> hi Sophie,
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss
>> >>>>>>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hi Mike,
>> >>>>>>>>>>> Thanks for the references.  By "threshold at 0" do you mean
>> >> set
>> >>>>>>>>>>> any
>> >>>>>>>>>>> negative values equal to 0?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> yes.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Do you think this is the best approach?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I haven't explored this area, and would defer to the McMurdie
>> >> and
>> >>>>>>>>>> Holmes paper for the best combinations of distance and
>> >>>>>>>>>> transformation.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks again,
>> >>>>>>>>>>> Sophie
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Apr 14, 2014 at 11:01 AM, Michael Love
>> >>>>>>>>>>> <michaelisaiahlove at gmail.com> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I tried poking around here
>> >>>>>>>>>>>> http://joey711.github.io/phyloseq/distance
>> >>>>>>>>>>>> but couldn't see if the authors did anything for distances
>> >>>>>>>>>>>> requiring
>> >>>>>>>>>>>> non-negative data. It appears
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>
>> >> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003531
>> >>>>>>>>>>>> that VST was tested with Bray-Curtis distance. I think the
>> >>>>>>>>>>>> distance
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>> designed for counts, but you could always threshold at 0 to
>> >>>>>>>>>>>> insist
>> >>>>>>>>>>>> that the
>> >>>>>>>>>>>> log2-like quantity act more like a count.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss
>> >>>>>>>>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi Mike,
>> >>>>>>>>>>>>> Thanks for explaining more.  I am used to working with
>> >>>>>>>>>>>>> rarefied
>> >>>>>>>>>>>>> microbial datasets, that is why.  Instead of rarefying I
>> >> would
>> >>>>>>>>>>>>> like to use
>> >>>>>>>>>>>>> the DESeq method.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> How would you then suggest going about calculating
>> >> bray-curtis
>> >>>>>>>>>>>>> distance, or summarized taxa diagrams with these new
>> >>>>>>>>>>>>> transformed
>> >>>>>>>>>>>>> matrices
>> >>>>>>>>>>>>> with negative values?
>> >>>>>>>>>>>>> Thanks again,
>> >>>>>>>>>>>>> Sophie
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love
>> >>>>>>>>>>>>> <michaelisaiahlove at gmail.com> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> hi Sophie,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Can you explain why you don't want negative values in the
>> >>>>>>>>>>>>>> transformed
>> >>>>>>>>>>>>>> values?  Adding one to the raw counts is not sufficient. I
>> >>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>> have said
>> >>>>>>>>>>>>>> in my previous email, "the expected counts on the common
>> >>>>>>>>>>>>>> scale".
>> >>>>>>>>>>>>>> If the
>> >>>>>>>>>>>>>> size factor for a sample is 2, then an expected count of 1
>> >>>>>>>>>>>>>> leads
>> >>>>>>>>>>>>>> to an
>> >>>>>>>>>>>>>> expected count of 1/2 on the common scale (after accounting
>> >>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>> size
>> >>>>>>>>>>>>>> factors).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss
>> >>>>>>>>>>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Hi Mike,
>> >>>>>>>>>>>>>>> Thanks for your reply!  Ok, makes sense, but I added 1 to
>> >>>>>>>>>>>>>>> all my
>> >>>>>>>>>>>>>>> matrix values, so the lowest value in the matrix is 1 -
>> >>>>>>>>>>>>>>> there
>> >>>>>>>>>>>>>>> are still
>> >>>>>>>>>>>>>>> negatives?
>> >>>>>>>>>>>>>>> Thanks again,
>> >>>>>>>>>>>>>>> Sophie
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love
>> >>>>>>>>>>>>>>> <michaelisaiahlove at gmail.com> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> hi Sophie,
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The transformations in DESeq and DESeq2 are log2-like
>> >>>>>>>>>>>>>>>> transformations. If the expected count is between 0 and
>> >> 1,
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> values can be
>> >>>>>>>>>>>>>>>> negative, this does not indicate a problem.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Mike
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss
>> >>>>>>>>>>>>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Hello,
>> >>>>>>>>>>>>>>>>> I have microbiome data with no replicates, from
>> >> different
>> >>>>>>>>>>>>>>>>> conditions.  I am
>> >>>>>>>>>>>>>>>>> trying to transform the data using the DESeq method, as
>> >>>>>>>>>>>>>>>>> described
>> >>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> McMurdie and Holmes 2014.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> The attached file is the definition I am using, as per
>> >> the
>> >>>>>>>>>>>>>>>>> supplemental
>> >>>>>>>>>>>>>>>>> info in McMurdie and Holmes 2014, and the .biom file I
>> >> am
>> >>>>>>>>>>>>>>>>> using.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thank you for your help,
>> >>>>>>>>>>>>>>>>> Sophie
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> _______________________________________________
>> >>>>>>>>>>>>>>>>> Bioconductor mailing list
>> >>>>>>>>>>>>>>>>> Bioconductor at r-project.org
>> >>>>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>>>>>>>>>>>>>>> Search the archives:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>



More information about the Bioconductor mailing list