[BioC] HTqPCR normalization issues - third posting

Thu Oct 10 18:48:04 CEST 2013

On 10/10/2013 09:35, heidi wrote:
> On 10/10/2013 07:40, alessandro.guffanti at genomnia.com wrote:
>> Thanks, much appreciated !
>> It would be important for us to understand wether we are doing
>> something fundamental wrong, or if there actually is a bug on the
>> software (happens), because we are using heavily this package for
>> validating NGS gene expression analysis findings..
>> Thanks you so much for the excellent work !
>> Keep in touch
>> Alessandro & Elena
>> On 10/10/2013 3:56 PM, James W. MacDonald wrote:
>> 
>>> Hi Allesandro,
>>> I believe this package is still maintained, and it is unfortunate 
>>> that you have not received a reply. The expectation is that package 
>>> maintainers will subscribe (and pay attention) to the Bioc listserv, 
>>> but the list is fairly high traffic, so it never hurts to add a CC to 
>>> the maintainer as well (which I have done for you).
>>> Best,
>>> Jim
>>> On Thursday, October 10, 2013 8:35:06 AM, Alessandro Guffanti 
>>> [guest] wrote:
>>> 
>>>> Dear all, this is our third posting without a real reply so we 
>>>> wonder if this package is actually not maintained anymore ? if yes, 
>>>> it would be useful for us to know...
>>>> We are using HTqPCR to analyze a set of cards which we trasformed 
>>>> in this format, which is accepted by HtQPCR:
>>>>   2    Run05    41    Passed    sample 41    ABCC5    Target    30
>>>>   3    Run05    41    Passed    sample 41    ADM    Target    31.3
>>>>   4    Run05    41    Passed    sample 41    CEBPB    Target    
>>>> 29.8
>>>>   5    Run05    41    Passed    sample 41    CSF1R    Target    
>>>> 31.2
>>>>   6    Run05    41    Passed    sample 41    CXCL16    Target    
>>>> 26.9
>>>>   7    Run05    41    Passed    sample 41    CYC1    Target    25.7
>>>>   [...]
>>>>   The total number of files and groups is as follows - summarized 
>>>> in the file "Elenco_1.txt" which is used below:
>>>>   File    Group
>>>>   41.txt    Sano
>>>>   39.txt    Sano
>>>>   37.txt    Sano
>>>>   35.txt    Sano
>>>>   43.txt    Sano
>>>>   34.txt    Sano
>>>>   44.txt    Sano
>>>>   38.txt    Sano
>>>>   48.txt    Sano
>>>>   40.txt    Sano
>>>>   47.txt    Sano
>>>>   6.txt    Non Responder DISEASE
>>>>   26.txt    Non Responder DISEASE
>>>>   2.txt    Non Responder DISEASE
>>>>   69.txt    Non Responder DISEASE
>>>>   68.txt    Non Responder DISEASE
>>>>   5.txt    Non Responder DISEASE
>>>>   71.txt    Responder DISEASE
>>>>   3.txt    Responder DISEASE
>>>>   17.txt    Responder DISEASE
>>>>   1.txt    Responder DISEASE
>>>>   19.txt    Responder DISEASE
>>>>   The comparison is DISEASE vs non DISEASE, but what leaves us 
>>>> dubious is the normalization part.
>>>>   Note that sample 41 is the *first* of the list.
>>>>   Here is the code up to the dump of the normalized values 
>>>> matrices:
>>>>   library("HTqPCR")
>>>>   path <- ("whatever/")
>>>>   files <- read.delim (file.path(path, "Elenco_1.txt"))
>>>>   files
>>>>   filelist <- as.character(files$File)
>>>>   filelist
>>>>   raw <- readCtData(files = filelist, path = path, n.features=46, 
>>>> type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
>>>>   featureNames (raw)
>>>>   raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, 
>>>> replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)
>>>>   s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
>>>>   exprs(s.norm)
>>>>   write.table(exprs(s.norm),file="Ct norm scaling.txt")
>>>>   g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
>>>>   exprs(g.norm)
>>>>   write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>>>>   Now if we look at the content of the two expression value files, 
>>>> it looks like that the first column
>>>>   (corresponding to the first sample) is always unchanged, while 
>>>> all the others have been normalized.
>>>>   In this case the first dataset is sample 41 so you can check 
>>>> comparing between the corresponding column
>>>>   above and below what is happening.
>>>>   We do not include here all the columns; however, you can see that 
>>>> all the samples *except the first (number 41)* have all their values 
>>>> normalized
>>>>   Ct norm scaling:
>>>>       41    39    37    35    43    34    44    38
>>>>   ABCC5    30    27.37706161    26.47393365    29.7721327    
>>>> 31.20189573    26.39260664    26.32436019    27.54274882
>>>>   ADM    31.3    30.36540284    28.51753555    32.31241706    
>>>> 34.40473934    26.29800948    29.82796209    28.60208531
>>>>   CEBPB    29.8    28.53383886    26.65971564    27.84151659    
>>>> 30.06540284    27.3385782    27.36597156    26.29080569
>>>>   CSF1R    31.2    27.66625592    28.05308057    37.18976303    
>>>> 36.98767773    31.0278673    34.56255924    29.75772512
>>>>   CXCL16    26.9    27.56985782    24.15165877    30.28018957    
>>>> 28.82559242    25.91962085    26.89251185    26.96492891
>>>>    Ct norm geometric
>>>>       41    39    37    35    43    34    44    38
>>>>   ABCC5    30    27.73443878    26.93934246    29.88113261    
>>>> 30.76352197    26.51166676    26.8989347    27.49219508
>>>>   ADM    31.3    30.76178949    29.01887064    32.4307173    
>>>> 33.92136694    26.41664286    30.47900874    28.5495872
>>>>   CEBPB    29.8    28.90631647    27.12839047    27.94344824    
>>>> 29.64299633    27.46190571    27.96328103    26.24254985
>>>>   CSF1R    31.2    28.0274082    28.5462506    37.32591991    
>>>> 36.46801611    31.16783762    35.31694663    29.70310587
>>>>   CXCL16    26.9    27.92975172    24.57624224    30.39104955    
>>>> 28.42060473    26.03654728    27.47948724    26.91543574
>>>>   This looks odd - why the first sample seems to be taken as a 
>>>> 'reference' for both normalization methods and hence is left 
>>>> unchanged ?
>>>>   This happens with ANY normalization procedure selected.
>>>>   Another (related ?) oddity is that in the final differential 
>>>> analysis result the same sample ID is always reported
>>>>   in the feature.pos field, as you can see below:
>>>>       genes    feature.pos    t.test    p.value    adj.p.value
>>>>   22    NUCB1    41    -1.998838921    0.077900837    0.251381346
>>>>   8    ERH    41    -1.958143348    0.091329532    0.251381346
>>>>   16    MAFB    41    -1.887142703    0.09421993    0.251381346
>>>>   28    RNF130    41    -1.904866754    0.099644523    0.251381346
>>>>   3    CEBPB    41    -1.853176708    0.103563968    0.251381346
>>>>   18    MSR1    41    -1.80887129    0.10432619    0.251381346
>>>>   Are we doing something wrong in the data input or subsequent 
>>>> elaboration here? can we actually trust these normalizations?
>>>>   Many thanks in advance - kind regards
>>>>   Alessandro & Elena
>>>>   -- output of sessionInfo():
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils     datasets  
>>>> methods
>>>> [8] base
>>>> other attached packages:
>>>> [1] HTqPCR_1.14.0      limma_3.16.8       RColorBrewer_1.0-5 
>>>> Biobase_2.20.1
>>>> [5] BiocGenerics_0.6.0
>>>> loaded via a namespace (and not attached):
>>>> [1] affy_1.38.1           affyio_1.28.0         
>>>> BiocInstaller_1.10.3
>>>> [4] gdata_2.13.2          gplots_2.11.3         gtools_3.0.0
>>>> [7] preprocessCore_1.22.0 stats4_3.0.1          zlibbioc_1.6.0
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor [1]
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor 
>>>> [2]
>>> --
>>> James W. MacDonald, M.S.
Hi Alessandro,

my apologies for the lack of a response. I've recently had to take a 
hiatus from Bioconductor, but will resume work on HTqPCR following the 
impending Bioconductor release.

Best,
\Heidi

>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
> Hi Alessandro,
> 
> my apologies for the lack of a reply. I'm recently had to take a
> hiatus from Bioconductor, but will resume work on HTqPCR following the
> impending Bioconductor release.
> 
> best,
> \Heidi
> 
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>> --
>>  Alessandro Guffanti
>> Alessandro Guffanti
>> Head, Bioinformatics
>> GENOMNIA SRL
>> Via Nerviano, 31/B – 20020 Lainate (MI)
>> Tel. +39-0293305.702 / Fax +39-0293305.777
>> www.genomnia.com [3]
>> alessandro.guffanti at genomnia.com
>> P PER CORTESIA, PRIMA DI STAMPARE QUESTA E-MAIL PENSATE ALL'AMBIENTE.
>>            PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS MAIL 
>> NOTE.
>> -----------------------------------------------------------
>>  Il Contenuto del presente messaggio potrebbe contenere informazioni
>> confidenziali a favore dei
>>  soli destinatari del messaggio stesso. Qualora riceviate per errore
>> questo messaggio siete pregati
>>  di cancellarlo dalla memoria del computer e di contattare i numeri
>> sopra indicati. Ogni utilizzo o
>>  ritrasmissione dei contenuti del messaggio da parte di soggetti
>> diversi dai destinatari è da
>>  considerarsi vietato ed abusivo.
>> The information transmitted is intended only for the person or
>> entity to which it is addressed and
>>  contains confidential and/or privileged material. Any review,
>> retransmission, dissemination or other
>>  use of, or taking of any action in reliance upon, this information
>> by persons or entities other than
>>  the intended recipient is prohibited. If you received this in error,
>> please contact the sender and
>>  delete the material from any computer.
>>  -----------------------------------------------------------
>> 
>> Links:
>> ------
>> [1] https://stat.ethz.ch/mailman/listinfo/bioconductor
>> [2] http://news.gmane.org/gmane.science.biology.informatics.conductor
>> [3] http://www.genomnia.com