Hi,
    Ryan, thanks for your thoughts!

I noticed in the manual the calcNormFactors step, and I've used it when
applying EdgeR to detect genes di erentially expressed between tumor and
normal tissue, adjusting for any di fferences between the patients (as in
the oral carcinoma case study).

That analysis provides a single p-value, looking for the genes most up, or
down as a result of the disease across patients.

Now I'm trying to look for differences on a patient-by-patient basis.  For
example, we have a handful of genes that we think can help explain some of
the variability between patients (they all have the same disease, but they
are heterogeneous, and we have some gene-specific hypotheses).  We'd like
to query whether expression changes coincide with some other phenotypes,
looking patient-by-patient.

In other words, I want to answer: Is gene x differentially expressed in
patient y?  What about z?  Is there a gene that's up in y but down in z?

I realize that there are no replicates, as we have one tumor and one normal
for each patient, so we'll need to be careful drawing any major
conclusions...  but I'd still like something, probably I can feel OK about
the most dramatically different genes in a single patient.

To this end, my reading of the manual pointed me towards simply using
the binomTest function, which reads in a vector of counts for each of two
samples, and produces a vector of p-values, one for each gene.  Under that
scenario, I notice big (order of magnitude) differences of p-values
 depending on whether I kept the unmapped reads or filtered them out of the
counts vectors.


If anyone has an alternative/better idea for a way to handle this, I would
love to hear it!

Presuming the binomial test is the best way to go, my current opinion is to
remove the unmapped/ambiguously mapped before passing the counts vectors in.

Any thoughts welcome!  (does it make sense?)

Thanks,
Jonathan




On Thu, Oct 31, 2013 at 1:55 PM, Devon Ryan <dpryan@dpryan.com> wrote:

> You can just remove those lines (in fact, that's what DESeq2 does
> internally), they'll just needlessly increase the number of tests performed.
>
> ____________________________________________
> Devon Ryan, Ph.D.
> Email: dpryan@dpryan.com
> Tel: +49 (0)178 298-6067
> Molecular and Cellular Cognition Lab
> German Centre for Neurodegenerative Diseases (DZNE)
> Ludwig-Erhard-Allee 2
> 53175 Bonn, Germany
>
> On Oct 31, 2013, at 6:30 PM, Jon BR wrote:
>
> > Hello,
> >    I'm interested in calculating differential expression from some paired
> > RNAseq samples.
> >
> > I've used htseq-count after mapping; quite happy with how easy that was.
> >
> > My question is with regard to whether or not to trip the last five rows
> > from htseq-count output.
> >
> > Those rows look like this:
> > no_feature 152030
> > ambiguous 4876
> > too_low_aQual 0
> > not_aligned 0
> > alignment_not_unique 0
> >
> > I can dream of reasons supporting either side of this question.. The
> number
> > of unmapped or ambiguously-mapping reads do contribute to the total
> library
> > size.  However, I'm also interested in quantifying the difference between
> > what's human in both samples, so intuition would tell me to remove those
> > reads.
> >
> > Because the counts are big, this matters a great deal.  I'm using EdgeR
> > (again, very happy with that software), and the manual cites htseq-count
> as
> > a  viable methodology, but doesn't comment on their preferred treatment
> of
> > the unmapped reads.
> >
> > My first (somewhat careless) utilization of EdgeR gave us results that
> > appeared to make sense, but upon digging a little deeper, I noticed that
> > this question affects the p-values quite a lot because the unmapped
> counts
> > are so big.
> >
> > I would appreciate any comments/opinions!
> >
> > Thanks,
> > Jonathan
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

	[[alternative HTML version deleted]]

