[BioC] CQN normalization

Sermsawat Tunlaya-Anukit stunlay at ncsu.edu
Tue Jul 17 17:01:00 CEST 2012

I would like to normlization populus RNA seq with package CQN. I follow the
code as show in manual, but after run i got the RPKM as minus(see detail
below). What is the meaning of minus in RPKM? Or i done something wrong in
my code? I try to visulize with cqn plot, and i think my data are not bias
by GC content or lenght. I attached file of cqn plot for your information.
Do you think i need to use your method to normailze data?

Sincerely yours,
Sermsawat T.

#####normalize by CQN
> #find GC content from phytozome populus genome
> library(Rsamtools)
> seq <- scanFa("pop.fa")
> alph <- alphabetFrequency(seq, as.prob=TRUE)
> gc <- rowSums(alph[,c("G", "C")])
> library(cqn)
> library(scales)
> raw <- read.table("raw.txt", header=FALSE)
> cqn.raw <- cqn(raw, lengths = width(seq), x = gc, sizeFactors =
colSums(raw), verbose = TRUE)
RQ fit ..................
> #cqn plot
> par(mfrow=c(1,2))
> cqnplot(cqn.raw, n = 1, xlab = "GC content", lty = 1, ylim = c(1,7))
> cqnplot(cqn.raw, n = 2, xlab = "length", lty = 1, ylim = c(1,7))
> #normalizedvalues
> RPKM.cqn <- cqn.raw$y + cqn.raw$offset
> head(RPKM.cqn)
             V1          V2         V3        V4         V5         V6
[1,] -2.3157759 -0.05011048 -1.5628979 -2.792141 -1.5042294 -2.1660771
[2,]  3.3638330  3.46092091  3.5447488  3.333357  3.1484934  3.4531245
[3,] -0.1924864  0.38536951  0.8976638 -1.134989 -0.1924864  0.4854206
[4,] -2.2808497 -0.50606822 -1.5613805 -2.802241 -1.4820398 -3.1508760
[5,] -2.2813593 -0.50602611 -1.5615417 -2.802341 -1.4822924 -3.1510596
[6,]  4.6875943  4.68565806  2.6239807  3.531895  3.5440903  1.9465045
            V7          V8          V9       V10        V11        V12
[1,] -3.417837 -0.05011048 -0.05011048 -1.203393 -0.5058032 -1.4579155
[2,]  2.900329  3.07780614  3.41975117  3.693540  3.3387816  3.3901923
[3,] -1.369314 -0.05011048  0.58067142  0.437804  1.0192374  0.8445695
[4,] -3.459816 -0.38013895 -0.05011048 -1.182797 -0.5064885 -1.4613012
[5,] -3.459801 -0.38013895 -0.05011048 -1.183303 -0.5066405  0.1602583
[6,]  2.282431  2.19639308  1.39946079  4.753675  4.8337840  2.7983964
           V13       V14        V15        V16        V17        V18
[1,] -4.303745 -1.844632 -3.1044275 -3.4759833 -0.1924864 -1.2015133
[2,]  3.116199  2.942757  3.2217711  2.9451114  2.8716179  3.4992808
[3,] -1.627192 -1.779755 -1.5114489 -0.8413003 -0.1924864  0.3091839
[4,] -3.289903 -2.870566 -3.1055409 -3.5218687 -0.3801390 -1.2399416
[5,] -4.290241 -2.870453 -2.1055097 -3.5218168 -0.3801390 -1.2400014
[6,]  3.685705  3.460004  0.8445695  2.4490306  2.6279081  0.5358322
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[5] LC_TIME=English_United States.1252
attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base
other attached packages:
 [1] scales_0.2.1          Rsamtools_1.8.5       Biostrings_2.24.1
 [4] GenomicRanges_1.8.7   IRanges_1.14.4        BiocGenerics_0.2.0
 [7] cqn_1.2.0             quantreg_4.81         SparseM_0.96
[10] preprocessCore_1.18.0 nor1mix_1.1-3         mclust_3.5
loaded via a namespace (and not attached):
 [1] bitops_1.0-4.1     colorspace_1.1-1   dichromat_1.2-4
 [5] munsell_0.3        plyr_1.7.1         RColorBrewer_1.0-5
 [9] stringr_0.6        zlibbioc_1.2.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cqnplot.pdf
Type: application/pdf
Size: 18583 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20120717/0b43c6ee/attachment.pdf>

More information about the Bioconductor mailing list