[BioC] extracting CPM from a DGElist after normalization in edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Apr 4 01:57:36 CEST 2014
Dear Alessandro,
I see that Devon Ryan has answered your question, but the answer is also
available directly from the help system. If you type help("cpm") the
first line of Details says:
"CPM or RPKM values are useful descriptive measures for the expression
level of a gene or transcript. By default, the normalized library sizes
are used in the computation for DGEList objects but simple column sums for
matrices."
Best wishes
Gordon
> Date: Thu, 03 Apr 2014 11:58:57 +0200
> From: "alessandro.guffanti at genomnia.com"
> <alessandro.guffanti at genomnia.com>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: [BioC] extracting CPM from a DGElist after normalization in
> edgeR
>
> Dear colleagues and dear Mark:
>
> We want to extract the CPM from a DGElist object after TMM
> normalization, common and tagwise dispersion evaluation.
>
> Here is a sample of the code, it's just a standard edgeR DEG workflow:
>
> samplesheet1 <- read.delim("sample_sheet.txt", stringsAsFactors = FALSE)
> comparison <- c("Tumor_1","Tumor_2")
> RG1 <- readDGE(samplesheet1,header=FALSE)
> colnames(RG1) <- gsub(".mature_cnt.txt","",RG1$sample$files,perl=TRUE)
> N<-length(colnames(RG1))/2
> K<-10
> keep <- rowSums(cpm(RG1) > K) >= N
> difflist1 <- RG1[keep,]
> difflist1$samples$lib.size <- colSums(difflist1$counts)
> difflist1 <- calcNormFactors(difflist1)
> difflist1 <- estimateCommonDisp(difflist1)
> difflist1 <- estimateTagwiseDisp(difflist1)
> currentDiff <- difflist1
> de.com <- exactTest(currentDiff,pair=comparison)
>
> ... then there is the creation of an Excel output dataframes, including
> the CPM worksheet:
>
> addDataFrame(*round(cpm(currentDiff)*[,currentDiff$samples$group==comparison[1]
> | currentDiff$samples$group==comparison[2]]), ucpm_sheet, startRow=2,
> startColumn=1)
>
> Now, the CPM content leaves us quite puzzled because it is very variable
> according to the syntax we use
> to extract it, and this particular syntax seems to give values which do
> not sum up to 1 million as expected.
> This is the syntax coherent with the
>
> Apparently the only CPM values which do sum up correctly to 1 million es
> expected are those extracted with the following
> instruction
>
> head(cpm(currentDiff$counts))
>
> But what are then these CPM values, which are different from the ones
> extracted with the command above ?
>
> head(cpm(currentDiff))
>
>
> These CPM values are different also from the values obtained with this
> command
>
>
> head(cpm(currentDiff$pseudo.counts))
>
>
> Many thanks in advance for any clarification !
>
> Alessandro, Moreno & Paolo
>
> ----
>
>
>> head(cpm(currentDiff))
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C
> ST8C ST11C LT1P LT2P
>
> hsa-miR-140-5p 306.550 209.4272 233.03 273.53 581.762 263.281 140.452
> 285.775 633.920 679.240 232.223 396.27 203.828
>
> hsa-miR-133a-3p 28.945 14.5648 51.61 44.04 4.116 5.495 33.441
> 55.981 4.393 18.394 2.609 29.45 2.474
>
> hsa-miR-665 11.695 21.7297 36.35 16.71 4.785 18.865 46.817 3.762
> 12.213 19.308 20.162 3.22 32.849
>
> hsa-miR-1250-5p 2.047 0.1175 13.03 10.21 3.344 13.462 3.541
> 6.396 2.357 4.166 3.202 28.35 4.123
>
> hsa-miR-381-3p 19.223 24.1963 84.53 44.74 16.104 10.898 33.834
> 182.992 43.926 37.092 1.186 25.37 10.308
>
> hsa-miR-374b-3p 39.543 7.6348 10.46 27.62 31.077 18.224 25.966
> 17.457 39.747 29.572 7.353 22.46 13.607
>
> LT3P ST2P ST10P ST12P ST7P ST8P ST4P ST5P ST11P SB
>
> hsa-miR-140-5p 401.843 309.98 408.091 231.87 469.49 664.349 213.087
> 551.31 870.579 326.37
>
> hsa-miR-133a-3p 2.974 16.17 5.822 14.44 43.60 48.011 27.482
> 27.06 13.031 41.21
>
> hsa-miR-665 15.179 46.63 18.831 15.42 4.69 3.359 7.923 11.81
> 3.425 4.57
>
> hsa-miR-1250-5p 10.904 12.05 18.831 19.21 27.76 1.976 21.870
> 48.22 67.411 12.49
>
> hsa-miR-381-3p 11.957 44.73 10.007 16.40 21.45 23.907 80.795
> 21.46 32.661 75.00
>
> hsa-miR-374b-3p 30.048 30.12 15.829 21.17 23.99 46.924 13.122
> 28.86 36.587 19.18
>
>
> > head(cpm(currentDiff$counts))
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C
> ST8C ST11C LT1P LT2P
>
> hsa-miR-140-5p 215.684 196.8196 237.43 307.42 605.485 285.767 142.576
> 288.156 546.467 593.42 153.0286 458.283 167.040
>
> hsa-miR-133a-3p 20.365 13.6880 52.59 49.49 4.284 5.964 33.947
> 56.448 3.787 16.07 1.7194 34.058 2.027
>
> hsa-miR-665 8.228 20.4215 37.04 18.78 4.980 20.476 47.525 3.794
> 10.529 16.87 13.2864 3.724 26.920
>
> hsa-miR-1250-5p 1.440 0.1104 13.28 11.47 3.481 14.611 3.594
> 6.449 2.032 3.64 2.1102 32.786 3.379
>
> hsa-miR-381-3p 13.525 22.7397 86.13 50.28 16.761 11.828 34.346
> 184.517 37.866 32.41 0.7816 29.335 8.448
>
> hsa-miR-374b-3p 27.822 7.1751 10.66 31.05 32.344 19.780 26.359
> 17.602 34.264 25.84 4.8456 25.975 11.151
>
> LT3P ST2P ST10P ST12P ST7P ST8P ST4P ST5P ST11P SB
>
> hsa-miR-140-5p 418.080 330.59 487.258 231.94 532.639 720.513 260.36
> 599.79 929.610 342.404
>
> hsa-miR-133a-3p 3.094 17.25 6.952 14.44 49.461 52.070 33.58
> 29.44 13.915 43.239
>
> hsa-miR-665 15.792 49.73 22.484 15.42 5.321 3.643 9.68 12.85
> 3.657 4.795
>
> hsa-miR-1250-5p 11.345 12.85 22.484 19.22 31.491 2.143 26.72
> 52.46 71.982 13.100
>
> hsa-miR-381-3p 12.441 47.70 11.948 16.40 24.338 25.928 98.72
> 23.34 34.876 78.687
>
> hsa-miR-374b-3p 31.263 32.12 18.899 21.17 27.216 50.891 16.03
> 31.40 39.068 20.121
>
>
> > head(cpm(currentDiff$pseudo.counts))
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C
> ST8C ST11C LT1P LT2P
>
> hsa-miR-140-5p 215.686 196.824 237.42 307.42 605.506 285.765 142.74
> 288.149 546.461 593.414 153.0351 458.284 167.054
>
> hsa-miR-133a-3p 20.362 13.698 52.59 49.49 4.256 5.961 33.80
> 56.483 3.793 16.071 1.7286 34.088 2.048
>
> hsa-miR-665 8.221 20.425 37.04 18.79 4.953 20.476 47.03 3.781
> 10.532 16.869 13.2858 3.712 26.891
>
> hsa-miR-1250-5p 1.432 0.121 13.29 11.47 3.461 14.617 3.71
> 6.449 2.037 3.642 2.1169 32.804 3.399
>
> hsa-miR-381-3p 13.517 22.749 86.14 50.28 16.735 11.826 34.44
> 184.584 37.867 32.407 0.7897 29.345 8.461
>
> hsa-miR-374b-3p 27.838 7.186 10.65 31.05 32.352 19.779 26.37
> 17.595 34.259 25.835 4.8545 25.973 11.165
>
> LT3P ST2P ST10P ST12P ST7P ST8P ST4P ST5P ST11P SB
>
> hsa-miR-140-5p 418.084 330.60 487.258 231.96 532.637 720.512 260.347
> 599.80 929.622 342.403
>
> hsa-miR-133a-3p 3.073 17.26 6.948 14.45 49.475 52.067 33.578
> 29.45 13.909 43.238
>
> hsa-miR-665 15.786 49.68 22.485 15.42 5.309 3.645 9.673 12.85
> 3.649 4.794
>
> hsa-miR-1250-5p 11.334 12.86 22.482 19.23 31.489 2.145 26.717
> 52.48 72.001 13.099
>
> hsa-miR-381-3p 12.430 47.70 11.945 16.41 24.330 25.929 98.743
> 23.34 34.875 78.686
>
> hsa-miR-374b-3p 31.272 32.12 18.897 21.18 27.212 50.889 16.025
> 31.40 39.072 20.120
>
>
> > colSums(cpm(currentDiff))
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C
> ST8C ST11C LT1P LT2P LT3P ST2P
>
> 1421292 1064057 981465 889765 960819 921314 985099 991736 1160034
> 1144623 1517511 864691 1220229 961164 937648
>
> ST10P ST12P ST7P ST8P ST4P ST5P ST11P SB
>
> 837525 999688 881438 922050 818447 919173 936499 953165
>
>
> > colSums(cpm(currentDiff$counts))
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C ST8C ST11C LT1P
> LT2P LT3P ST2P ST10P ST12P ST7P ST8P ST4P
>
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
>
> ST5P ST11P SB
>
> 1e+06 1e+06 1e+06
>
>
> > colSums(cpm(currentDiff$pseudo.counts))
>
> LT1C LT2C LT3C ST2C ST4C ST10C ST12C ST5C ST7C ST8C ST11C LT1P
> LT2P LT3P ST2P ST10P ST12P ST7P ST8P ST4P
>
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
>
> ST5P ST11P SB
>
> 1e+06 1e+06 1e+06
>
>
>
>> sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
>
> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
> LC_MONETARY=Italian_Italy.1252
>
> [4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252
>
> attached base packages:
>
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
>
> [1] xlsx_0.5.5 xlsxjars_0.5.0 rJava_0.9-6 edgeR_3.4.2 limma_3.18.9
>
> loaded via a namespace (and not attached):
>
> [1] tools_3.0.2
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list