[BioC] HTqPCR normalization issues - third posting
heidi
heidi at ebi.ac.uk
Thu Oct 10 18:35:41 CEST 2013
On 10/10/2013 07:40, alessandro.guffanti at genomnia.com wrote:
> Thanks, much appreciated !
>
> It would be important for us to understand wether we are doing
> something fundamental wrong, or if there actually is a bug on the
> software (happens), because we are using heavily this package for
> validating NGS gene expression analysis findings..
>
> Thanks you so much for the excellent work !
>
> Keep in touch
>
> Alessandro & Elena
>
> On 10/10/2013 3:56 PM, James W. MacDonald wrote:
>
>> Hi Allesandro,
>>
>> I believe this package is still maintained, and it is unfortunate
>> that you have not received a reply. The expectation is that package
>> maintainers will subscribe (and pay attention) to the Bioc listserv,
>> but the list is fairly high traffic, so it never hurts to add a CC to
>> the maintainer as well (which I have done for you).
>>
>> Best,
>>
>> Jim
>>
>> On Thursday, October 10, 2013 8:35:06 AM, Alessandro Guffanti [guest]
>> wrote:
>>
>>> Dear all, this is our third posting without a real reply so we
>>> wonder if this package is actually not maintained anymore ? if yes,
>>> it would be useful for us to know...
>>>
>>> We are using HTqPCR to analyze a set of cards which we trasformed in
>>> this format, which is accepted by HtQPCR:
>>>
>>> 2 Run05 41 Passed sample 41 ABCC5 Target 30
>>> 3 Run05 41 Passed sample 41 ADM Target 31.3
>>> 4 Run05 41 Passed sample 41 CEBPB Target 29.8
>>> 5 Run05 41 Passed sample 41 CSF1R Target 31.2
>>> 6 Run05 41 Passed sample 41 CXCL16 Target
>>> 26.9
>>> 7 Run05 41 Passed sample 41 CYC1 Target 25.7
>>>
>>> [...]
>>>
>>> The total number of files and groups is as follows - summarized in
>>> the file "Elenco_1.txt" which is used below:
>>>
>>> File Group
>>> 41.txt Sano
>>> 39.txt Sano
>>> 37.txt Sano
>>> 35.txt Sano
>>> 43.txt Sano
>>> 34.txt Sano
>>> 44.txt Sano
>>> 38.txt Sano
>>> 48.txt Sano
>>> 40.txt Sano
>>> 47.txt Sano
>>> 6.txt Non Responder DISEASE
>>> 26.txt Non Responder DISEASE
>>> 2.txt Non Responder DISEASE
>>> 69.txt Non Responder DISEASE
>>> 68.txt Non Responder DISEASE
>>> 5.txt Non Responder DISEASE
>>> 71.txt Responder DISEASE
>>> 3.txt Responder DISEASE
>>> 17.txt Responder DISEASE
>>> 1.txt Responder DISEASE
>>> 19.txt Responder DISEASE
>>>
>>> The comparison is DISEASE vs non DISEASE, but what leaves us
>>> dubious is the normalization part.
>>> Note that sample 41 is the *first* of the list.
>>>
>>> Here is the code up to the dump of the normalized values matrices:
>>>
>>> library("HTqPCR")
>>> path <- ("whatever/")
>>> files <- read.delim (file.path(path, "Elenco_1.txt"))
>>> files
>>> filelist <- as.character(files$File)
>>> filelist
>>> raw <- readCtData(files = filelist, path = path, n.features=46,
>>> type=7, flag=NULL, feature=6, Ct=8, header=FALSE, n.data=1)
>>> featureNames (raw)
>>> raw.cat <- setCategory(raw, Ct.max=36, Ct.min=9, replicates=FALSE,
>>> quantile=0.9, groups =files$Group, verbose=TRUE)
>>>
>>> s.norm <- normalizeCtData(raw.cat, norm="scale.rank")
>>> exprs(s.norm)
>>> write.table(exprs(s.norm),file="Ct norm scaling.txt")
>>>
>>> g.norm <- normalizeCtData(raw.cat, norm="geometric.mean")
>>> exprs(g.norm)
>>> write.table(exprs(g.norm),file="Ct norm media geometrica.txt")
>>>
>>> Now if we look at the content of the two expression value files,
>>> it looks like that the first column
>>> (corresponding to the first sample) is always unchanged, while all
>>> the others have been normalized.
>>>
>>> In this case the first dataset is sample 41 so you can check
>>> comparing between the corresponding column
>>> above and below what is happening.
>>>
>>> We do not include here all the columns; however, you can see that
>>> all the samples *except the first (number 41)* have all their values
>>> normalized
>>>
>>> Ct norm scaling:
>>>
>>> 41 39 37 35 43 34 44 38
>>> ABCC5 30 27.37706161 26.47393365 29.7721327
>>> 31.20189573 26.39260664 26.32436019 27.54274882
>>> ADM 31.3 30.36540284 28.51753555 32.31241706
>>> 34.40473934 26.29800948 29.82796209 28.60208531
>>> CEBPB 29.8 28.53383886 26.65971564 27.84151659
>>> 30.06540284 27.3385782 27.36597156 26.29080569
>>> CSF1R 31.2 27.66625592 28.05308057 37.18976303
>>> 36.98767773 31.0278673 34.56255924 29.75772512
>>> CXCL16 26.9 27.56985782 24.15165877 30.28018957
>>> 28.82559242 25.91962085 26.89251185 26.96492891
>>> Ct norm geometric
>>>
>>> 41 39 37 35 43 34 44 38
>>> ABCC5 30 27.73443878 26.93934246 29.88113261
>>> 30.76352197 26.51166676 26.8989347 27.49219508
>>> ADM 31.3 30.76178949 29.01887064 32.4307173
>>> 33.92136694 26.41664286 30.47900874 28.5495872
>>> CEBPB 29.8 28.90631647 27.12839047 27.94344824
>>> 29.64299633 27.46190571 27.96328103 26.24254985
>>> CSF1R 31.2 28.0274082 28.5462506 37.32591991
>>> 36.46801611 31.16783762 35.31694663 29.70310587
>>> CXCL16 26.9 27.92975172 24.57624224 30.39104955
>>> 28.42060473 26.03654728 27.47948724 26.91543574
>>>
>>> This looks odd - why the first sample seems to be taken as a
>>> 'reference' for both normalization methods and hence is left
>>> unchanged ?
>>>
>>> This happens with ANY normalization procedure selected.
>>>
>>> Another (related ?) oddity is that in the final differential
>>> analysis result the same sample ID is always reported
>>> in the feature.pos field, as you can see below:
>>>
>>> genes feature.pos t.test p.value adj.p.value
>>> 22 NUCB1 41 -1.998838921 0.077900837 0.251381346
>>> 8 ERH 41 -1.958143348 0.091329532 0.251381346
>>> 16 MAFB 41 -1.887142703 0.09421993 0.251381346
>>> 28 RNF130 41 -1.904866754 0.099644523 0.251381346
>>> 3 CEBPB 41 -1.853176708 0.103563968 0.251381346
>>> 18 MSR1 41 -1.80887129 0.10432619 0.251381346
>>>
>>> Are we doing something wrong in the data input or subsequent
>>> elaboration here? can we actually trust these normalizations?
>>>
>>> Many thanks in advance - kind regards
>>>
>>> Alessandro & Elena
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets
>>> methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] HTqPCR_1.14.0 limma_3.16.8 RColorBrewer_1.0-5
>>> Biobase_2.20.1
>>> [5] BiocGenerics_0.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.38.1 affyio_1.28.0 BiocInstaller_1.10.3
>>> [4] gdata_2.13.2 gplots_2.11.3 gtools_3.0.0
>>> [7] preprocessCore_1.22.0 stats4_3.0.1 zlibbioc_1.6.0
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor [1]
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor [2]
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
Hi Alessandro,
my apologies for the lack of a reply. I'm recently had to take a hiatus
from Bioconductor, but will resume work on HTqPCR following the
impending Bioconductor release.
best,
\Heidi
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>
> --
> Alessandro Guffanti
>
> Alessandro Guffanti
>
> Head, Bioinformatics
>
> GENOMNIA SRL
>
> Via Nerviano, 31/B – 20020 Lainate (MI)
>
> Tel. +39-0293305.702 / Fax +39-0293305.777
>
> www.genomnia.com [3]
>
> alessandro.guffanti at genomnia.com
>
> P PER CORTESIA, PRIMA DI STAMPARE QUESTA E-MAIL PENSATE ALL'AMBIENTE.
>
> PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS MAIL
> NOTE.
> -----------------------------------------------------------
> Il Contenuto del presente messaggio potrebbe contenere informazioni
> confidenziali a favore dei
> soli destinatari del messaggio stesso. Qualora riceviate per errore
> questo messaggio siete pregati
> di cancellarlo dalla memoria del computer e di contattare i numeri
> sopra indicati. Ogni utilizzo o
> ritrasmissione dei contenuti del messaggio da parte di soggetti
> diversi dai destinatari è da
> considerarsi vietato ed abusivo.
>
> The information transmitted is intended only for the person or
> entity to which it is addressed and
> contains confidential and/or privileged material. Any review,
> retransmission, dissemination or other
> use of, or taking of any action in reliance upon, this information
> by persons or entities other than
> the intended recipient is prohibited. If you received this in error,
> please contact the sender and
> delete the material from any computer.
> -----------------------------------------------------------
>
>
> Links:
> ------
> [1] https://stat.ethz.ch/mailman/listinfo/bioconductor
> [2] http://news.gmane.org/gmane.science.biology.informatics.conductor
> [3] http://www.genomnia.com
More information about the Bioconductor
mailing list