[BioC] phyloseq/DESeq gives negative transformed values

Sat Apr 19 19:29:57 CEST 2014

hi Sophie,

You are getting negative values from the transformation for the
reasons I mentioned earlier, the transformation is log2-like.

If you want to do something downstream of our software which requires
non-negative values, below is some example code of how to threshold
negative values for a matrix in R.

The question of what is the best distance to use for taxa counts, or
whether ANOVA on variance stabilized data is a good idea for taxa
counts, depends on the properties of the data, and this is an area of
active research. As I don't have experience analyzing this kind of
data, I don't want to make any guesses.

> m <- matrix(-2:5, ncol=2)
> m
     [,1] [,2]
[1,]   -2    2
[2,]   -1    3
[3,]    0    4
[4,]    1    5
> m[m < 0] <- 0
> m
     [,1] [,2]
[1,]    0    2
[2,]    0    3
[3,]    0    4
[4,]    1    5

On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss
<Sophie.Weiss at colorado.edu> wrote:
> Hi Mike,
> Could you please check whether I am running this correctly?  I have double
> checked all the parameters, but for some reason, I am getting negatives
> using the R script on the attached .biom dataset.  There are no replicates
> in this microbial dataset.
> Thanks for your advice,
> Sophie
>
>
> On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss
> <Sophie.Weiss at colorado.edu> wrote:
>>
>> Thanks Mike, that is what I thought.  What if we wanted to perform kruskal
>> wallis, or is it possible to perform anova on the variance-stabilized
>> matrix?
>>
>>
>> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love
>> <michaelisaiahlove at gmail.com> wrote:
>>>
>>> hi Sophie,
>>>
>>> We recommend using the standard DESeq() function for differential
>>> expression.
>>>
>>> This is mentioned in the first line of the vignette section on
>>> transformations:
>>>
>>> "In order to test for diff erential expression, we operate on raw
>>> counts and use discrete distributions as
>>> described in the previous section"
>>>
>>> Also, in the McMurdie and Holmes, they are using the DESeq() function,
>>> as shown in their supplemental material:
>>>
>>>
>>> http://joey711.github.io/waste-not-supplemental/simulation-differential-abundance/simulation-differential-abundance-server.html
>>>
>>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss
>>> <Sophie.Weiss at colorado.edu> wrote:
>>> > Please help with this?  Thanks again.
>>> >
>>> >
>>> > On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss
>>> > <Sophie.Weiss at colorado.edu> wrote:
>>> >>
>>> >> Thanks again Mike - would it be ok to do chi-2 and other significance
>>> >> tests on the DESeq transformed datasets using independent code, or is
>>> >> it
>>> >> necessary to do the differential expression tests strictly within
>>> >> DESeq2?
>>> >>
>>> >> Sophie
>>> >>
>>> >>
>>> >> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love
>>> >> <michaelisaiahlove at gmail.com> wrote:
>>> >>>
>>> >>> hi Sophie,
>>> >>>
>>> >>> The VST code is the same in DESeq and DESeq2. The estimation of
>>> >>> dispersion is slightly different (details are in the vignette
>>> >>> "Changes
>>> >>> from DESeq to DESeq2"), but the fitted line (which is used by the
>>> >>> VST)
>>> >>> should be very similar.
>>> >>>
>>> >>> Mike
>>> >>>
>>> >>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss
>>> >>> <Sophie.Weiss at colorado.edu> wrote:
>>> >>> > Hi Mike,
>>> >>> > The McMurdie and Holmes paper uses DESeq for matrix normalization -
>>> >>> > do
>>> >>> > you
>>> >>> > think that is ok, or would it be better to use DESeq 2?
>>> >>> > Thanks again,
>>> >>> > Sophie
>>> >>> >
>>> >>> >
>>> >>> > On Mon, Apr 14, 2014 at 3:40 PM, Michael Love
>>> >>> > <michaelisaiahlove at gmail.com>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> hi Sophie,
>>> >>> >>
>>> >>> >>
>>> >>> >> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss
>>> >>> >> <Sophie.Weiss at colorado.edu> wrote:
>>> >>> >> >
>>> >>> >> > Hi Mike,
>>> >>> >> > Thanks for the references.  By "threshold at 0" do you mean set
>>> >>> >> > any
>>> >>> >> > negative values equal to 0?
>>> >>> >>
>>> >>> >>
>>> >>> >> yes.
>>> >>> >>
>>> >>> >>
>>> >>> >> >
>>> >>> >> > Do you think this is the best approach?
>>> >>> >>
>>> >>> >>
>>> >>> >> I haven't explored this area, and would defer to the McMurdie and
>>> >>> >> Holmes paper for the best combinations of distance and
>>> >>> >> transformation.
>>> >>> >>
>>> >>> >>
>>> >>> >> >
>>> >>> >> > Thanks again,
>>> >>> >> > Sophie
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On Mon, Apr 14, 2014 at 11:01 AM, Michael Love
>>> >>> >> > <michaelisaiahlove at gmail.com> wrote:
>>> >>> >> >>
>>> >>> >> >> I tried poking around here
>>> >>> >> >> http://joey711.github.io/phyloseq/distance
>>> >>> >> >> but couldn't see if the authors did anything for distances
>>> >>> >> >> requiring
>>> >>> >> >> non-negative data. It appears
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003531
>>> >>> >> >> that VST was tested with Bray-Curtis distance. I think the
>>> >>> >> >> distance
>>> >>> >> >> is
>>> >>> >> >> designed for counts, but you could always threshold at 0 to
>>> >>> >> >> insist
>>> >>> >> >> that the
>>> >>> >> >> log2-like quantity act more like a count.
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss
>>> >>> >> >> <Sophie.Weiss at colorado.edu> wrote:
>>> >>> >> >>>
>>> >>> >> >>> Hi Mike,
>>> >>> >> >>> Thanks for explaining more.  I am used to working with
>>> >>> >> >>> rarefied
>>> >>> >> >>> microbial datasets, that is why.  Instead of rarefying I would
>>> >>> >> >>> like to use
>>> >>> >> >>> the DESeq method.
>>> >>> >> >>>
>>> >>> >> >>> How would you then suggest going about calculating bray-curtis
>>> >>> >> >>> distance, or summarized taxa diagrams with these new
>>> >>> >> >>> transformed
>>> >>> >> >>> matrices
>>> >>> >> >>> with negative values?
>>> >>> >> >>> Thanks again,
>>> >>> >> >>> Sophie
>>> >>> >> >>>
>>> >>> >> >>>
>>> >>> >> >>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love
>>> >>> >> >>> <michaelisaiahlove at gmail.com> wrote:
>>> >>> >> >>>>
>>> >>> >> >>>> hi Sophie,
>>> >>> >> >>>>
>>> >>> >> >>>> Can you explain why you don't want negative values in the
>>> >>> >> >>>> transformed
>>> >>> >> >>>> values?  Adding one to the raw counts is not sufficient. I
>>> >>> >> >>>> should
>>> >>> >> >>>> have said
>>> >>> >> >>>> in my previous email, "the expected counts on the common
>>> >>> >> >>>> scale".
>>> >>> >> >>>> If the
>>> >>> >> >>>> size factor for a sample is 2, then an expected count of 1
>>> >>> >> >>>> leads
>>> >>> >> >>>> to an
>>> >>> >> >>>> expected count of 1/2 on the common scale (after accounting
>>> >>> >> >>>> for
>>> >>> >> >>>> size
>>> >>> >> >>>> factors).
>>> >>> >> >>>>
>>> >>> >> >>>>
>>> >>> >> >>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss
>>> >>> >> >>>> <Sophie.Weiss at colorado.edu> wrote:
>>> >>> >> >>>>>
>>> >>> >> >>>>> Hi Mike,
>>> >>> >> >>>>> Thanks for your reply!  Ok, makes sense, but I added 1 to
>>> >>> >> >>>>> all my
>>> >>> >> >>>>> matrix values, so the lowest value in the matrix is 1 -
>>> >>> >> >>>>> there
>>> >>> >> >>>>> are still
>>> >>> >> >>>>> negatives?
>>> >>> >> >>>>> Thanks again,
>>> >>> >> >>>>> Sophie
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love
>>> >>> >> >>>>> <michaelisaiahlove at gmail.com> wrote:
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> hi Sophie,
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> The transformations in DESeq and DESeq2 are log2-like
>>> >>> >> >>>>>> transformations. If the expected count is between 0 and 1,
>>> >>> >> >>>>>> the
>>> >>> >> >>>>>> values can be
>>> >>> >> >>>>>> negative, this does not indicate a problem.
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> Mike
>>> >>> >> >>>>>>
>>> >>> >> >>>>>>
>>> >>> >> >>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss
>>> >>> >> >>>>>> <Sophie.Weiss at colorado.edu> wrote:
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> Hello,
>>> >>> >> >>>>>>> I have microbiome data with no replicates, from different
>>> >>> >> >>>>>>> conditions.  I am
>>> >>> >> >>>>>>> trying to transform the data using the DESeq method, as
>>> >>> >> >>>>>>> described
>>> >>> >> >>>>>>> in
>>> >>> >> >>>>>>> McMurdie and Holmes 2014.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> The attached file is the definition I am using, as per the
>>> >>> >> >>>>>>> supplemental
>>> >>> >> >>>>>>> info in McMurdie and Holmes 2014, and the .biom file I am
>>> >>> >> >>>>>>> using.
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> Thank you for your help,
>>> >>> >> >>>>>>> Sophie
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> _______________________________________________
>>> >>> >> >>>>>>> Bioconductor mailing list
>>> >>> >> >>>>>>> Bioconductor at r-project.org
>>> >>> >> >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >>> >> >>>>>>> Search the archives:
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>>
>>> >>> >> >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >>> >> >>>>>>
>>> >>> >> >>>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>
>>> >>> >> >>>
>>> >>> >> >>
>>> >>> >> >
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> >
>>
>>
>