[BioC] HTqPCR normalization issues again

Alessandro Guffanti - Elena Brini [guest] guest at bioconductor.org
Tue Sep 24 12:14:22 CEST 2013


 Dear all (and we thing especially Heidi):

 we are using HTqPCR to analyze a set of cards which we trasformed in this format, which is accepted by HtQPCR:

 2    Run05    41    Passed    sample 41    ABCC5    Target    30
 3    Run05    41    Passed    sample 41    ADM    Target    31.3
 4    Run05    41    Passed    sample 41    CEBPB    Target    29.8
 5    Run05    41    Passed    sample 41    CSF1R    Target    31.2
 6    Run05    41    Passed    sample 41    CXCL16    Target    26.9
 7    Run05    41    Passed    sample 41    CYC1    Target    25.7

 [...]

 The total number of files and groups is as follows - summarized in the file "Elenco_1.txt" which is used below:

 File    Group
 41.txt    Sano
 39.txt    Sano
 37.txt    Sano
 35.txt    Sano
 43.txt    Sano
 34.txt    Sano
 44.txt    Sano
 38.txt    Sano
 48.txt    Sano
 40.txt    Sano
 47.txt    Sano
 6.txt    Non Responder DISEASE
 26.txt    Non Responder DISEASE
 2.txt    Non Responder DISEASE
 69.txt    Non Responder DISEASE
 68.txt    Non Responder DISEASE
 5.txt    Non Responder DISEASE
 71.txt    Responder DISEASE
 3.txt    Responder DISEASE
 17.txt    Responder DISEASE
 1.txt    Responder DISEASE
 19.txt    Responder DISEASE

 The comparison is DISEASE vs non DISEASE, but what leaves us dubious is the normalization part.
 Note that sample 41 is the *first* of the list.

 Here is the code up to the dump of the normalized values matrices:

 library("HTqPCR")
 path <- ("whatever/")
 files <- read.delim (file.path(path, "Elenco_1.txt"))
 files
 filelist <- as.character(files$File)
 filelist
 raw <- readCtData(files = filelist, path = path, n.features=46, type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
 featureNames (raw)
 raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)

 s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
 exprs(s.norm)
 write.table(exprs(s.norm),file="Ct norm scaling.txt")

 g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
 exprs(g.norm)
 write.table(exprs(g.norm),file="Ct norm media geometrica.txt")

 Now if we look at the content of the two expression value files, it looks like that the first column
 (corresponding to the first sample) is always unchanged, while all the others have been normalized.

 In this case the first dataset is sample 41 so you can check comparing between the corresponding column 
 above and below what is happening.

 We do not include here all the columns; however, you can see that all the samples *except the first (number 41)* have all their values normalized

 Ct norm scaling:

     41    39    37    35    43    34    44    38
 ABCC5    30    27.37706161    26.47393365    29.7721327    31.20189573    26.39260664    26.32436019    27.54274882
 ADM    31.3    30.36540284    28.51753555    32.31241706    34.40473934    26.29800948    29.82796209    28.60208531
 CEBPB    29.8    28.53383886    26.65971564    27.84151659    30.06540284    27.3385782    27.36597156    26.29080569
 CSF1R    31.2    27.66625592    28.05308057    37.18976303    36.98767773    31.0278673    34.56255924    29.75772512
 CXCL16    26.9    27.56985782    24.15165877    30.28018957    28.82559242    25.91962085    26.89251185    26.96492891
  Ct norm geometric

     41    39    37    35    43    34    44    38
 ABCC5    30    27.73443878    26.93934246    29.88113261    30.76352197    26.51166676    26.8989347    27.49219508
 ADM    31.3    30.76178949    29.01887064    32.4307173    33.92136694    26.41664286    30.47900874    28.5495872
 CEBPB    29.8    28.90631647    27.12839047    27.94344824    29.64299633    27.46190571    27.96328103    26.24254985
 CSF1R    31.2    28.0274082    28.5462506    37.32591991    36.46801611    31.16783762    35.31694663    29.70310587
 CXCL16    26.9    27.92975172    24.57624224    30.39104955    28.42060473    26.03654728    27.47948724    26.91543574

 This looks odd - why the first sample seems to be taken as a 'reference' for both normalization methods and hence is left unchanged ?

 This happens with ANY normalization procedure selected.

 Another (related ?) oddity is that in the final differential analysis result the same sample ID is always reported 
 in the feature.pos field, as you can see below:

     genes    feature.pos    t.test    p.value    adj.p.value
 22    NUCB1    41    -1.998838921    0.077900837    0.251381346
 8    ERH    41    -1.958143348    0.091329532    0.251381346
 16    MAFB    41    -1.887142703    0.09421993    0.251381346
 28    RNF130    41    -1.904866754    0.099644523    0.251381346
 3    CEBPB    41    -1.853176708    0.103563968    0.251381346
 18    MSR1    41    -1.80887129    0.10432619    0.251381346

 Are we doing something wrong in the data input or subsequent elaboration here? can we actually trust these normalizations?

 Many thanks in advance - kind regards

 Alessandro & Elena



 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] HTqPCR_1.14.0      limma_3.16.8       RColorBrewer_1.0-5 Biobase_2.20.1    
[5] BiocGenerics_0.6.0

loaded via a namespace (and not attached):
[1] affy_1.38.1           affyio_1.28.0         BiocInstaller_1.10.3 
[4] gdata_2.13.2          gplots_2.11.3         gtools_3.0.0         
[7] preprocessCore_1.22.0 stats4_3.0.1          zlibbioc_1.6.0   

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list