[BioC] extracting CPM from a DGElist after normalization in edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Apr 5 08:59:15 CEST 2014
Hi Alessandro,
I think you might not be understanding what scale normalization is. Have
a read of the section on normalization in the edgeR User's Guide. That
will also answer your question on pseudo-counts.
Best wishes
Gordon
On Fri, 4 Apr 2014, alessandro.guffanti at genomnia.com wrote:
> Hello - thanks also for this second clarification. I actually read this help
> line, but it
> was a bit obscure to me
>
> Let me try to summarize:
>
>> colSums*(**cpm(currentDiff$counts)*) , where currentDiff is a DGEList
> object after normalization
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C ST8C ST11C LT1P
> LT2P LT3P ST2P ST10P ST12P ST7P ST8P ST4P
>
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
>
> ==> the *count **matrix (matrix of cpm)* in this case does not use the values
> normalized by library size,
> so the values add up to 1 million, correct ? in this case, though, I can
> compare directly values between
> samples.
>
>> colSums(*cpm(currentDiff)*)
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C
> ST8C ST11C LT1P LT2P LT3P ST2P
>
> 1421292 1064057 981465 889765 960819 921314 985099 991736 1160034
> 1144623 1517511 864691 1220229 961164 937648
>
> ST10P ST12P ST7P ST8P ST4P
>
> 837525 999688 881438 922050 818447
>
> ==> these are the same count values, but normalized by library sizes, so the
> CPM will not add up to 1.000.000 (roughly)
> per sample, correct ? this is also the way in which CPM are extracted in the
> manual.
>
> But I don't understand one thing: we scale up (or don't scale up) the
> libraries by size, then we calculate the CPM.
> Still the CPM should add up to 1 million for each sample in the two
> categories, so that every gene can be compared
> directly between samples
>
> Am I missing something fundamental here or the scaling is done *after* the
> CPM calculation ?
>
> Let me know, cheers,
>
> Alessandro
>
> PS
>
> A naive question: what is the role (roughly) for pesudocounts ?
>
> Many thanks for your feedback,
>
> Alessandro & Co.
>
> --
>
> Dear Alessandro,
>
> I see that Devon Ryan has answered your question, but the answer is also
> available directly from the help system. If you type help("cpm") the first
> line of Details says:
>
> "CPM or RPKM values are useful descriptive measures for the expression level
> of a gene or transcript. By default, the normalized library sizes are used in
> the computation for DGEList objects but simple column sums for matrices."
>
> Best wishes
> Gordon
>
>
>
>
>
> -----------------------------------------------------------
> Il Contenuto del presente messaggio potrebbe contenere informazioni
> confidenziali a favore dei
> soli destinatari del messaggio stesso. Qualora riceviate per errore questo
> messaggio siete pregati di cancellarlo dalla memoria del computer e di
> contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei
> contenuti del messaggio da parte di soggetti diversi dai destinatari รจ da
> considerarsi vietato ed abusivo.
>
> The information transmitted is intended only for the p...{{dropped:15}}
More information about the Bioconductor
mailing list