[BioC] CQN normalization
Sermsawat Tunlaya-Anukit
stunlay at ncsu.edu
Tue Jul 17 17:01:00 CEST 2012
I would like to normlization populus RNA seq with package CQN. I follow the
code as show in manual, but after run i got the RPKM as minus(see detail
below). What is the meaning of minus in RPKM? Or i done something wrong in
my code? I try to visulize with cqn plot, and i think my data are not bias
by GC content or lenght. I attached file of cqn plot for your information.
Do you think i need to use your method to normailze data?
Sincerely yours,
Sermsawat T.
#####normalize by CQN
> #find GC content from phytozome populus genome
> library(Rsamtools)
> seq <- scanFa("pop.fa")
> alph <- alphabetFrequency(seq, as.prob=TRUE)
> gc <- rowSums(alph[,c("G", "C")])
>
> library(cqn)
> library(scales)
> raw <- read.table("raw.txt", header=FALSE)
> cqn.raw <- cqn(raw, lengths = width(seq), x = gc, sizeFactors =
colSums(raw), verbose = TRUE)
RQ fit ..................
SQN .
>
> #cqn plot
> par(mfrow=c(1,2))
> cqnplot(cqn.raw, n = 1, xlab = "GC content", lty = 1, ylim = c(1,7))
> cqnplot(cqn.raw, n = 2, xlab = "length", lty = 1, ylim = c(1,7))
>
> #normalizedvalues
> RPKM.cqn <- cqn.raw$y + cqn.raw$offset
> head(RPKM.cqn)
V1 V2 V3 V4 V5 V6
[1,] -2.3157759 -0.05011048 -1.5628979 -2.792141 -1.5042294 -2.1660771
[2,] 3.3638330 3.46092091 3.5447488 3.333357 3.1484934 3.4531245
[3,] -0.1924864 0.38536951 0.8976638 -1.134989 -0.1924864 0.4854206
[4,] -2.2808497 -0.50606822 -1.5613805 -2.802241 -1.4820398 -3.1508760
[5,] -2.2813593 -0.50602611 -1.5615417 -2.802341 -1.4822924 -3.1510596
[6,] 4.6875943 4.68565806 2.6239807 3.531895 3.5440903 1.9465045
V7 V8 V9 V10 V11 V12
[1,] -3.417837 -0.05011048 -0.05011048 -1.203393 -0.5058032 -1.4579155
[2,] 2.900329 3.07780614 3.41975117 3.693540 3.3387816 3.3901923
[3,] -1.369314 -0.05011048 0.58067142 0.437804 1.0192374 0.8445695
[4,] -3.459816 -0.38013895 -0.05011048 -1.182797 -0.5064885 -1.4613012
[5,] -3.459801 -0.38013895 -0.05011048 -1.183303 -0.5066405 0.1602583
[6,] 2.282431 2.19639308 1.39946079 4.753675 4.8337840 2.7983964
V13 V14 V15 V16 V17 V18
[1,] -4.303745 -1.844632 -3.1044275 -3.4759833 -0.1924864 -1.2015133
[2,] 3.116199 2.942757 3.2217711 2.9451114 2.8716179 3.4992808
[3,] -1.627192 -1.779755 -1.5114489 -0.8413003 -0.1924864 0.3091839
[4,] -3.289903 -2.870566 -3.1055409 -3.5218687 -0.3801390 -1.2399416
[5,] -4.290241 -2.870453 -2.1055097 -3.5218168 -0.3801390 -1.2400014
[6,] 3.685705 3.460004 0.8445695 2.4490306 2.6279081 0.5358322
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] scales_0.2.1 Rsamtools_1.8.5 Biostrings_2.24.1
[4] GenomicRanges_1.8.7 IRanges_1.14.4 BiocGenerics_0.2.0
[7] cqn_1.2.0 quantreg_4.81 SparseM_0.96
[10] preprocessCore_1.18.0 nor1mix_1.1-3 mclust_3.5
loaded via a namespace (and not attached):
[1] bitops_1.0-4.1 colorspace_1.1-1 dichromat_1.2-4
labeling_0.1
[5] munsell_0.3 plyr_1.7.1 RColorBrewer_1.0-5
stats4_2.15.1
[9] stringr_0.6 zlibbioc_1.2.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cqnplot.pdf
Type: application/pdf
Size: 18583 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20120717/0b43c6ee/attachment.pdf>
More information about the Bioconductor
mailing list