[BioC] HTqPCR normalization issue ?

Sat Aug 24 22:37:23 CEST 2013

Dear all (and we thing especially Heidi):

we are using HTqPCR to analyze a set of cards which we casted on this format from the original one (being the only one being accepted in input):

2    Run05    41    Passed    sample 41    ABCC5    Target    30
3    Run05    41    Passed    sample 41    ADM    Target    31.3
4    Run05    41    Passed    sample 41    CEBPB    Target    29.8
5    Run05    41    Passed    sample 41    CSF1R    Target    31.2
6    Run05    41    Passed    sample 41    CXCL16    Target    26.9
7    Run05    41    Passed    sample 41    CYC1    Target    25.7
8    Run05    41    Passed    sample 41    DYNLT1    Target    25.8
9    Run05    41    Passed    sample 41    EREG    Target    35.6
10    Run05    41    Passed    sample 41    ERH    Target    25.9
11    Run05    41    Passed    sample 41    FGD4    Target    40
12    Run05    41    Passed    sample 41    GPX1    Target    20.4
[...]

The total number of files and groups is as follows (this is the file "Elenco_1.txt" which is used below):

File    Group
41.txt    Sano
39.txt    Sano
37.txt    Sano
35.txt    Sano
43.txt    Sano
34.txt    Sano
44.txt    Sano
38.txt    Sano
48.txt    Sano
40.txt    Sano
47.txt    Sano
6.txt    Non Responder DISEASE
26.txt    Non Responder DISEASE
2.txt    Non Responder DISEASE
69.txt    Non Responder DISEASE
68.txt    Non Responder DISEASE
5.txt    Non Responder DISEASE
71.txt    Responder DISEASE
3.txt    Responder DISEASE
17.txt    Responder DISEASE
1.txt    Responder DISEASE
19.txt    Responder DISEASE

The comparison is DISEASE vs non DISEASE, but what leaves us dubious is the normalization part.

Here is the code up to the dump of the normalized values matrices:

library("HTqPCR")
path <- ("C:/Users/BRINIEL/Desktop/new_analisi_card1/analisiAeB/")
files <- read.delim (file.path(path, "Elenco_1.txt"))
files
filelist <- as.character(files$File)
filelist
raw <- readCtData(files = filelist, path = path, n.features=46, type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
featureNames (raw)
raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)

s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
exprs(s.norm)
write.table(exprs(s.norm),file="Ct norm scaling.txt")

g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
exprs(g.norm)
write.table(exprs(g.norm),file="Ct norm media geometrica.txt")

Now if we look at the content of the two expression value files, it looks like that the first column (corresponding to the first sample) is always unchanged, while all the others have been normalized.

In this case the first dataset is sample 41 so you can check comparing between the lines above and below what is happening.

We do not include here all the columns but all the samples except the first have all their values 'normalized'

Ct norm scaling:

    41    39    37    35    43    34    44    38
ABCC5    30    27.37706161    26.47393365    29.7721327    31.20189573    26.39260664    26.32436019    27.54274882
ADM    31.3    30.36540284    28.51753555    32.31241706    34.40473934    26.29800948    29.82796209    28.60208531
CEBPB    29.8    28.53383886    26.65971564    27.84151659    30.06540284    27.3385782    27.36597156    26.29080569
CSF1R    31.2    27.66625592    28.05308057    37.18976303    36.98767773    31.0278673    34.56255924    29.75772512
CXCL16    26.9    27.56985782    24.15165877    30.28018957    28.82559242    25.91962085    26.89251185    26.96492891
CYC1    25.7    23.52113744    22.01516588    26.92701422    27.27582938    22.89251185    22.53668246    23.88322275
DYNLT1    25.8    23.71393365    21.17914692    25.8092891    26.03601896    22.89251185    22.63137441    23.01649289
EREG    35.6    31.32938389    30.18957346    35.66559242    37.29763033    29.79810427    32.76341232    30.33554502

Ct norm geometric

    41    39    37    35    43    34    44    38
ABCC5    30    27.73443878    26.93934246    29.88113261    30.76352197    26.51166676    26.8989347    27.49219508
ADM    31.3    30.76178949    29.01887064    32.4307173    33.92136694    26.41664286    30.47900874    28.5495872
CEBPB    29.8    28.90631647    27.12839047    27.94344824    29.64299633    27.46190571    27.96328103    26.24254985
CSF1R    31.2    28.0274082    28.5462506    37.32591991    36.46801611    31.16783762    35.31694663    29.70310587
CXCL16    26.9    27.92975172    24.57624224    30.39104955    28.42060473    26.03654728    27.47948724    26.91543574
CYC1    25.7    23.82817979    22.40219004    27.02559775    26.89261523    22.99578263    23.02858438    23.83938594
DYNLT1    25.8    24.02349274    21.55147396    25.90378049    25.67022363    22.99578263    23.12534314    22.97424694
EREG    35.6    31.73835423    30.7203028    35.7961691    36.77361401    29.93252698    33.47853023    30.27986521

This looks a bit odd - why the first sample seems to be taken as a 'reference' for both normalization methods and hence is left unchanged ?

Another (related ?) oddity is that in the final differential analysis result the same sample ID is always reported in the feature.pos field, as you can see below:

    genes    feature.pos    t.test    p.value    adj.p.value
22    NUCB1    41    -1.998838921    0.077900837    0.251381346
8    ERH    41    -1.958143348    0.091329532    0.251381346
16    MAFB    41    -1.887142703    0.09421993    0.251381346
28    RNF130    41    -1.904866754    0.099644523    0.251381346
3    CEBPB    41    -1.853176708    0.103563968    0.251381346
18    MSR1    41    -1.80887129    0.10432619    0.251381346

Are we doing something wrong in the data input or subsequent elaboration here? can we actually trust these normalizations?

Many thanks in advance - kind regards

Alessandro & Elena

 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)

Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:

[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252  

[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                 

[5] LC_TIME=Italian_Italy.1252   

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods 

[8] base    

other attached packages:

[1] HTqPCR_1.14.0      limma_3.16.5       RColorBrewer_1.0-5 Biobase_2.20.0   

[5] BiocGenerics_0.6.0

loaded via a namespace (and not attached):

[1] affy_1.38.1           affyio_1.28.0         BiocInstaller_1.10.2

[4] gdata_2.12.0.2        gplots_2.11.0.1       gtools_2.7.1        

[7] preprocessCore_1.22.0 stats4_3.0.1          zlibbioc_1.6.0  

--
Sent via the guest posting facility at bioconductor.org.