[BioC] HTqPCR normalization issues - third posting

Thu Oct 10 18:35:41 CEST 2013

On 10/10/2013 07:40, alessandro.guffanti at genomnia.com wrote:
> Thanks, much appreciated !
> 
>  It would be important for us to understand wether we are doing
> something fundamental wrong, or if there actually is a bug on the
> software (happens), because we are using heavily this package for
> validating NGS gene expression analysis findings..
> 
>  Thanks you so much for the excellent work !
> 
>  Keep in touch
> 
>  Alessandro & Elena
> 
>  On 10/10/2013 3:56 PM, James W. MacDonald wrote:
> 
>> Hi Allesandro,
>> 
>> I believe this package is still maintained, and it is unfortunate 
>> that you have not received a reply. The expectation is that package 
>> maintainers will subscribe (and pay attention) to the Bioc listserv, 
>> but the list is fairly high traffic, so it never hurts to add a CC to 
>> the maintainer as well (which I have done for you).
>> 
>> Best,
>> 
>> Jim
>> 
>> On Thursday, October 10, 2013 8:35:06 AM, Alessandro Guffanti [guest] 
>> wrote:
>> 
>>> Dear all, this is our third posting without a real reply so we 
>>> wonder if this package is actually not maintained anymore ? if yes, 
>>> it would be useful for us to know...
>>> 
>>> We are using HTqPCR to analyze a set of cards which we trasformed in 
>>> this format, which is accepted by HtQPCR:
>>> 
>>>   2    Run05    41    Passed    sample 41    ABCC5    Target    30
>>>   3    Run05    41    Passed    sample 41    ADM    Target    31.3
>>>   4    Run05    41    Passed    sample 41    CEBPB    Target    29.8
>>>   5    Run05    41    Passed    sample 41    CSF1R    Target    31.2
>>>   6    Run05    41    Passed    sample 41    CXCL16    Target    
>>> 26.9
>>>   7    Run05    41    Passed    sample 41    CYC1    Target    25.7
>>> 
>>>   [...]
>>> 
>>>   The total number of files and groups is as follows - summarized in 
>>> the file "Elenco_1.txt" which is used below:
>>> 
>>>   File    Group
>>>   41.txt    Sano
>>>   39.txt    Sano
>>>   37.txt    Sano
>>>   35.txt    Sano
>>>   43.txt    Sano
>>>   34.txt    Sano
>>>   44.txt    Sano
>>>   38.txt    Sano
>>>   48.txt    Sano
>>>   40.txt    Sano
>>>   47.txt    Sano
>>>   6.txt    Non Responder DISEASE
>>>   26.txt    Non Responder DISEASE
>>>   2.txt    Non Responder DISEASE
>>>   69.txt    Non Responder DISEASE
>>>   68.txt    Non Responder DISEASE
>>>   5.txt    Non Responder DISEASE
>>>   71.txt    Responder DISEASE
>>>   3.txt    Responder DISEASE
>>>   17.txt    Responder DISEASE
>>>   1.txt    Responder DISEASE
>>>   19.txt    Responder DISEASE
>>> 
>>>   The comparison is DISEASE vs non DISEASE, but what leaves us 
>>> dubious is the normalization part.
>>>   Note that sample 41 is the *first* of the list.
>>> 
>>>   Here is the code up to the dump of the normalized values matrices:
>>> 
>>>   library("HTqPCR")
>>>   path <- ("whatever/")
>>>   files <- read.delim (file.path(path, "Elenco_1.txt"))
>>>   files
>>>   filelist <- as.character(files$File)
>>>   filelist
>>>   raw <- readCtData(files = filelist, path = path, n.features=46, 
>>> type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
>>>   featureNames (raw)
>>>   raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE, 
>>> quantile=0.9, groups =files$Group, verbose=TRUE)
>>> 
>>>   s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
>>>   exprs(s.norm)
>>>   write.table(exprs(s.norm),file="Ct norm scaling.txt")
>>> 
>>>   g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
>>>   exprs(g.norm)
>>>   write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>>> 
>>>   Now if we look at the content of the two expression value files, 
>>> it looks like that the first column
>>>   (corresponding to the first sample) is always unchanged, while all 
>>> the others have been normalized.
>>> 
>>>   In this case the first dataset is sample 41 so you can check 
>>> comparing between the corresponding column
>>>   above and below what is happening.
>>> 
>>>   We do not include here all the columns; however, you can see that 
>>> all the samples *except the first (number 41)* have all their values 
>>> normalized
>>> 
>>>   Ct norm scaling:
>>> 
>>>       41    39    37    35    43    34    44    38
>>>   ABCC5    30    27.37706161    26.47393365    29.7721327    
>>> 31.20189573    26.39260664    26.32436019    27.54274882
>>>   ADM    31.3    30.36540284    28.51753555    32.31241706    
>>> 34.40473934    26.29800948    29.82796209    28.60208531
>>>   CEBPB    29.8    28.53383886    26.65971564    27.84151659    
>>> 30.06540284    27.3385782    27.36597156    26.29080569
>>>   CSF1R    31.2    27.66625592    28.05308057    37.18976303    
>>> 36.98767773    31.0278673    34.56255924    29.75772512
>>>   CXCL16    26.9    27.56985782    24.15165877    30.28018957    
>>> 28.82559242    25.91962085    26.89251185    26.96492891
>>>    Ct norm geometric
>>> 
>>>       41    39    37    35    43    34    44    38
>>>   ABCC5    30    27.73443878    26.93934246    29.88113261    
>>> 30.76352197    26.51166676    26.8989347    27.49219508
>>>   ADM    31.3    30.76178949    29.01887064    32.4307173    
>>> 33.92136694    26.41664286    30.47900874    28.5495872
>>>   CEBPB    29.8    28.90631647    27.12839047    27.94344824    
>>> 29.64299633    27.46190571    27.96328103    26.24254985
>>>   CSF1R    31.2    28.0274082    28.5462506    37.32591991    
>>> 36.46801611    31.16783762    35.31694663    29.70310587
>>>   CXCL16    26.9    27.92975172    24.57624224    30.39104955    
>>> 28.42060473    26.03654728    27.47948724    26.91543574
>>> 
>>>   This looks odd - why the first sample seems to be taken as a 
>>> 'reference' for both normalization methods and hence is left 
>>> unchanged ?
>>> 
>>>   This happens with ANY normalization procedure selected.
>>> 
>>>   Another (related ?) oddity is that in the final differential 
>>> analysis result the same sample ID is always reported
>>>   in the feature.pos field, as you can see below:
>>> 
>>>       genes    feature.pos    t.test    p.value    adj.p.value
>>>   22    NUCB1    41    -1.998838921    0.077900837    0.251381346
>>>   8    ERH    41    -1.958143348    0.091329532    0.251381346
>>>   16    MAFB    41    -1.887142703    0.09421993    0.251381346
>>>   28    RNF130    41    -1.904866754    0.099644523    0.251381346
>>>   3    CEBPB    41    -1.853176708    0.103563968    0.251381346
>>>   18    MSR1    41    -1.80887129    0.10432619    0.251381346
>>> 
>>>   Are we doing something wrong in the data input or subsequent 
>>> elaboration here? can we actually trust these normalizations?
>>> 
>>>   Many thanks in advance - kind regards
>>> 
>>>   Alessandro & Elena
>>> 
>>>   -- output of sessionInfo():
>>> 
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> 
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>> 
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  
>>> methods
>>> [8] base
>>> 
>>> other attached packages:
>>> [1] HTqPCR_1.14.0      limma_3.16.8       RColorBrewer_1.0-5 
>>> Biobase_2.20.1
>>> [5] BiocGenerics_0.6.0
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.38.1           affyio_1.28.0         BiocInstaller_1.10.3
>>> [4] gdata_2.13.2          gplots_2.11.3         gtools_3.0.0
>>> [7] preprocessCore_1.22.0 stats4_3.0.1          zlibbioc_1.6.0
>>> 
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor [1]
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor [2]
>> 
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
Hi Alessandro,

my apologies for the lack of a reply. I'm recently had to take a hiatus 
from Bioconductor, but will resume work on HTqPCR following the 
impending Bioconductor release.

best,
\Heidi

>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
> 
> --
>  Alessandro Guffanti
> 
> Alessandro Guffanti
> 
> Head, Bioinformatics
> 
> GENOMNIA SRL
> 
> Via Nerviano, 31/B – 20020 Lainate (MI)
> 
> Tel. +39-0293305.702 / Fax +39-0293305.777
> 
>  www.genomnia.com [3]
> 
>  alessandro.guffanti at genomnia.com
> 
> P PER CORTESIA, PRIMA DI STAMPARE QUESTA E-MAIL PENSATE ALL'AMBIENTE.
> 
>             PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS MAIL 
> NOTE.
> -----------------------------------------------------------
>  Il Contenuto del presente messaggio potrebbe contenere informazioni
> confidenziali a favore dei
>  soli destinatari del messaggio stesso. Qualora riceviate per errore
> questo messaggio siete pregati
>  di cancellarlo dalla memoria del computer e di contattare i numeri
> sopra indicati. Ogni utilizzo o
>  ritrasmissione dei contenuti del messaggio da parte di soggetti
> diversi dai destinatari è da
>  considerarsi vietato ed abusivo.
> 
>  The information transmitted is intended only for the person or
> entity to which it is addressed and
>  contains confidential and/or privileged material. Any review,
> retransmission, dissemination or other
>  use of, or taking of any action in reliance upon, this information
> by persons or entities other than
>  the intended recipient is prohibited. If you received this in error,
> please contact the sender and
>  delete the material from any computer.
>  -----------------------------------------------------------
> 
> 
> Links:
> ------
> [1] https://stat.ethz.ch/mailman/listinfo/bioconductor
> [2] http://news.gmane.org/gmane.science.biology.informatics.conductor
> [3] http://www.genomnia.com