[BioC] HTqPCR normalization issues - third posting
heidi
heidi at ebi.ac.uk
Thu Oct 10 18:48:04 CEST 2013
On 10/10/2013 09:35, heidi wrote:
> On 10/10/2013 07:40, alessandro.guffanti at genomnia.com wrote:
>> Thanks, much appreciated !
>> It would be important for us to understand wether we are doing
>> something fundamental wrong, or if there actually is a bug on the
>> software (happens), because we are using heavily this package for
>> validating NGS gene expression analysis findings..
>> Thanks you so much for the excellent work !
>> Keep in touch
>> Alessandro & Elena
>> On 10/10/2013 3:56 PM, James W. MacDonald wrote:
>>
>>> Hi Allesandro,
>>> I believe this package is still maintained, and it is unfortunate
>>> that you have not received a reply. The expectation is that package
>>> maintainers will subscribe (and pay attention) to the Bioc listserv,
>>> but the list is fairly high traffic, so it never hurts to add a CC to
>>> the maintainer as well (which I have done for you).
>>> Best,
>>> Jim
>>> On Thursday, October 10, 2013 8:35:06 AM, Alessandro Guffanti
>>> [guest] wrote:
>>>
>>>> Dear all, this is our third posting without a real reply so we
>>>> wonder if this package is actually not maintained anymore ? if yes,
>>>> it would be useful for us to know...
>>>> We are using HTqPCR to analyze a set of cards which we trasformed
>>>> in this format, which is accepted by HtQPCR:
>>>> 2 Run05 41 Passed sample 41 ABCC5 Target 30
>>>> 3 Run05 41 Passed sample 41 ADM Target 31.3
>>>> 4 Run05 41 Passed sample 41 CEBPB Target
>>>> 29.8
>>>> 5 Run05 41 Passed sample 41 CSF1R Target
>>>> 31.2
>>>> 6 Run05 41 Passed sample 41 CXCL16 Target
>>>> 26.9
>>>> 7 Run05 41 Passed sample 41 CYC1 Target 25.7
>>>> [...]
>>>> The total number of files and groups is as follows - summarized
>>>> in the file "Elenco_1.txt" which is used below:
>>>> File Group
>>>> 41.txt Sano
>>>> 39.txt Sano
>>>> 37.txt Sano
>>>> 35.txt Sano
>>>> 43.txt Sano
>>>> 34.txt Sano
>>>> 44.txt Sano
>>>> 38.txt Sano
>>>> 48.txt Sano
>>>> 40.txt Sano
>>>> 47.txt Sano
>>>> 6.txt Non Responder DISEASE
>>>> 26.txt Non Responder DISEASE
>>>> 2.txt Non Responder DISEASE
>>>> 69.txt Non Responder DISEASE
>>>> 68.txt Non Responder DISEASE
>>>> 5.txt Non Responder DISEASE
>>>> 71.txt Responder DISEASE
>>>> 3.txt Responder DISEASE
>>>> 17.txt Responder DISEASE
>>>> 1.txt Responder DISEASE
>>>> 19.txt Responder DISEASE
>>>> The comparison is DISEASE vs non DISEASE, but what leaves us
>>>> dubious is the normalization part.
>>>> Note that sample 41 is the *first* of the list.
>>>> Here is the code up to the dump of the normalized values
>>>> matrices:
>>>> library("HTqPCR")
>>>> path <- ("whatever/")
>>>> files <- read.delim (file.path(path, "Elenco_1.txt"))
>>>> files
>>>> filelist <- as.character(files$File)
>>>> filelist
>>>> raw <- readCtData(files = filelist, path = path, n.features=46,
>>>> type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
>>>> featureNames (raw)
>>>> raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9,
>>>> replicates=FALSE, quantile=0.9, groups =files$Group, verbose=TRUE)
>>>> s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
>>>> exprs(s.norm)
>>>> write.table(exprs(s.norm),file="Ct norm scaling.txt")
>>>> g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
>>>> exprs(g.norm)
>>>> write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>>>> Now if we look at the content of the two expression value files,
>>>> it looks like that the first column
>>>> (corresponding to the first sample) is always unchanged, while
>>>> all the others have been normalized.
>>>> In this case the first dataset is sample 41 so you can check
>>>> comparing between the corresponding column
>>>> above and below what is happening.
>>>> We do not include here all the columns; however, you can see that
>>>> all the samples *except the first (number 41)* have all their values
>>>> normalized
>>>> Ct norm scaling:
>>>> 41 39 37 35 43 34 44 38
>>>> ABCC5 30 27.37706161 26.47393365 29.7721327
>>>> 31.20189573 26.39260664 26.32436019 27.54274882
>>>> ADM 31.3 30.36540284 28.51753555 32.31241706
>>>> 34.40473934 26.29800948 29.82796209 28.60208531
>>>> CEBPB 29.8 28.53383886 26.65971564 27.84151659
>>>> 30.06540284 27.3385782 27.36597156 26.29080569
>>>> CSF1R 31.2 27.66625592 28.05308057 37.18976303
>>>> 36.98767773 31.0278673 34.56255924 29.75772512
>>>> CXCL16 26.9 27.56985782 24.15165877 30.28018957
>>>> 28.82559242 25.91962085 26.89251185 26.96492891
>>>> Ct norm geometric
>>>> 41 39 37 35 43 34 44 38
>>>> ABCC5 30 27.73443878 26.93934246 29.88113261
>>>> 30.76352197 26.51166676 26.8989347 27.49219508
>>>> ADM 31.3 30.76178949 29.01887064 32.4307173
>>>> 33.92136694 26.41664286 30.47900874 28.5495872
>>>> CEBPB 29.8 28.90631647 27.12839047 27.94344824
>>>> 29.64299633 27.46190571 27.96328103 26.24254985
>>>> CSF1R 31.2 28.0274082 28.5462506 37.32591991
>>>> 36.46801611 31.16783762 35.31694663 29.70310587
>>>> CXCL16 26.9 27.92975172 24.57624224 30.39104955
>>>> 28.42060473 26.03654728 27.47948724 26.91543574
>>>> This looks odd - why the first sample seems to be taken as a
>>>> 'reference' for both normalization methods and hence is left
>>>> unchanged ?
>>>> This happens with ANY normalization procedure selected.
>>>> Another (related ?) oddity is that in the final differential
>>>> analysis result the same sample ID is always reported
>>>> in the feature.pos field, as you can see below:
>>>> genes feature.pos t.test p.value adj.p.value
>>>> 22 NUCB1 41 -1.998838921 0.077900837 0.251381346
>>>> 8 ERH 41 -1.958143348 0.091329532 0.251381346
>>>> 16 MAFB 41 -1.887142703 0.09421993 0.251381346
>>>> 28 RNF130 41 -1.904866754 0.099644523 0.251381346
>>>> 3 CEBPB 41 -1.853176708 0.103563968 0.251381346
>>>> 18 MSR1 41 -1.80887129 0.10432619 0.251381346
>>>> Are we doing something wrong in the data input or subsequent
>>>> elaboration here? can we actually trust these normalizations?
>>>> Many thanks in advance - kind regards
>>>> Alessandro & Elena
>>>> -- output of sessionInfo():
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252
>>>> [2] LC_CTYPE=English_United States.1252
>>>> [3] LC_MONETARY=English_United States.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets
>>>> methods
>>>> [8] base
>>>> other attached packages:
>>>> [1] HTqPCR_1.14.0 limma_3.16.8 RColorBrewer_1.0-5
>>>> Biobase_2.20.1
>>>> [5] BiocGenerics_0.6.0
>>>> loaded via a namespace (and not attached):
>>>> [1] affy_1.38.1 affyio_1.28.0
>>>> BiocInstaller_1.10.3
>>>> [4] gdata_2.13.2 gplots_2.11.3 gtools_3.0.0
>>>> [7] preprocessCore_1.22.0 stats4_3.0.1 zlibbioc_1.6.0
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor [1]
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> [2]
>>> --
>>> James W. MacDonald, M.S.
Hi Alessandro,
my apologies for the lack of a response. I've recently had to take a
hiatus from Bioconductor, but will resume work on HTqPCR following the
impending Bioconductor release.
Best,
\Heidi
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
> Hi Alessandro,
>
> my apologies for the lack of a reply. I'm recently had to take a
> hiatus from Bioconductor, but will resume work on HTqPCR following the
> impending Bioconductor release.
>
> best,
> \Heidi
>
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>> --
>> Alessandro Guffanti
>> Alessandro Guffanti
>> Head, Bioinformatics
>> GENOMNIA SRL
>> Via Nerviano, 31/B – 20020 Lainate (MI)
>> Tel. +39-0293305.702 / Fax +39-0293305.777
>> www.genomnia.com [3]
>> alessandro.guffanti at genomnia.com
>> P PER CORTESIA, PRIMA DI STAMPARE QUESTA E-MAIL PENSATE ALL'AMBIENTE.
>> PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS MAIL
>> NOTE.
>> -----------------------------------------------------------
>> Il Contenuto del presente messaggio potrebbe contenere informazioni
>> confidenziali a favore dei
>> soli destinatari del messaggio stesso. Qualora riceviate per errore
>> questo messaggio siete pregati
>> di cancellarlo dalla memoria del computer e di contattare i numeri
>> sopra indicati. Ogni utilizzo o
>> ritrasmissione dei contenuti del messaggio da parte di soggetti
>> diversi dai destinatari è da
>> considerarsi vietato ed abusivo.
>> The information transmitted is intended only for the person or
>> entity to which it is addressed and
>> contains confidential and/or privileged material. Any review,
>> retransmission, dissemination or other
>> use of, or taking of any action in reliance upon, this information
>> by persons or entities other than
>> the intended recipient is prohibited. If you received this in error,
>> please contact the sender and
>> delete the material from any computer.
>> -----------------------------------------------------------
>>
>> Links:
>> ------
>> [1] https://stat.ethz.ch/mailman/listinfo/bioconductor
>> [2] http://news.gmane.org/gmane.science.biology.informatics.conductor
>> [3] http://www.genomnia.com
More information about the Bioconductor
mailing list