[BioC] Delta CT data distribution and cluster analyses; machine learning or other
Moshe Olshansky
olshansky at wehi.EDU.AU
Mon May 16 03:44:18 CEST 2011
Hi John,
Richard's suggestion is correct - Delta Ct (or Ct itself) is on the
logarithmic scale, so the error is hopefully additive and they may be
treated as microarray data after log transformation.
However, there is one important difference: microarray usually contains
thousands of genes, so in most cases we may expect most of them not to
change and hence quantile normalization should be all right. But PCR data
usually has a few hundreds (or even less) genes, so one must be much more
careful when deciding whether it is reasonable to expect most of these
genes not to change between conditions (especially if these genes belong
to some set of interest). If this assumption is wrong (i.e. high
percentage of genes may change), quantile normalization should be avoided.
Regards,
Moshe.
> John,
>
> Do not raise deltaCt to a power and do a t-test.
> To test the hypothesis do deltaCt(condition 1)=deltaCt(condiiton 2) with a
> t-test.
>
> deltaCt=-log2M and will be closer to nornally distbutes that 2^-delatCt.
>
> I hope this helps.
>
>
> Best wishes,
> Rich
>
> On Sat, 14 May 2011, john herbert wrote:
>
>> The range of Raw CT values is around 15 to 35The 2^-deltaCT are very
>> small, less than zero. An example is 0.079703285
>> I have 5 case samples and 5 control samples. For all samples, there are
>> CT measures for target genes and house-keeper genes. Our approach is to
>> use houskeeper on each sample as that used in Delta CT calculation.
>>
>> E.g.
>> Sample Case 1 target CT = 15
>> Sample Case 1 house keeper CT = 10
>> Delta CT = 15-10 = 5
>> A = 2 to the power of minus delta CT, as in Excel =power(2,-(-5))
>> = 0.03125
>>
>> Then normal sample is the same....
>> Sample normal 1 target CT = 10
>> Sample normal 1 house keeper CT = 4
>> Delta CT = 10-4 = 6
>> 2 to the power of minus delta CT, as in Excel =power(2,-(-6)) = 0.015625
>>
>> I have lots of these small values. These values don't look normally
>> distributed.
>>
>> My view is maybe I should make an M value (log2 ratios) do ttests etc.
>>
>> Is this the best way to go for gene expression and subsequent
>> clustering?.
>>
>> Thank you.
>>
>>
>> On Fri, May 13, 2011 at 9:06 PM, Kevin R. Coombes
>> <kevin.r.coombes at gmail.com> wrote:
>> What is the range of the data that you received?
>>
>> In most TaqMan real-time PCR experiments, the Ct values range
>> between about 10 (for really really abuindant things like 18S) to
>> 40.
>> These measurements are in cycles. In principle, if you had
>> perfectly efficient probe-primer combination, the number of mRNA
>> molecules present would double every cycle. As a result, cycle
>> values are already essentially on the "negative log base two"
>> scale.
>>
>> As Richard already pointed out, the Delta-Ct or Delta-Delta-Ct
>> values on this scale are usually normal.
>>
>> If your data are not in a range that makes sense as cycles, then
>> it is likely that someone exponentiated the data to get it back to
>> the "raw" scale, and thus converted from normally distributed to
>> log-normal.
>>
>> Kevin
>>
>>
>>
>> Hi Richard,
>> Thank you. It is from taqman real time PCR. I have sent a mail
>> asking how
>> exactly they normalised the data.
>> We only have biological replicates and no common reference, so I
>> was told we
>> can only use Delta CT values.
>>
>> I make, maybe wrongly, that is Delta Delta CT values are normally
>> distributed that Delta CT values will also be normally
>> distributed?
>>
>> I will make plots of the raw data and Delta CT as I know it.
>>
>>
>>
>>
>>
>> On Fri, May 13, 2011 at 3:53 PM, Richard Friedman<
>> friedman at cancercenter.columbia.edu> wrote:
>>
>> Dear John,
>>
>> Is the Delta CT data from PCR or from some other
>> method?
>> If it is from PCR in my experience Delta Delta CT is usually
>> normally
>> distributed.
>> were the first delta references to the difference between
>> the experiment
>> and internal reference
>> (e.g. GAPDH) and the second delta refers to 2 experimental
>> conditions.
>>
>> With hopes that the above helps,
>> Rich
>> ------------------------------------------------------------
>> Richard A. Friedman, PhD
>> Associate Research Scientist,
>> Biomedical Informatics Shared Resource
>> Herbert Irving Comprehensive Cancer Center (HICCC)
>> Lecturer,
>> Department of Biomedical Informatics (DBMI)
>> Educational Coordinator,
>> Center for Computational Biology and Bioinformatics (C2B2)/
>> National Center for Multiscale Analysis of Genomic Networks
>> (MAGNet)
>> Room 824
>> Irving Cancer Research Center
>> Columbia University
>> 1130 St. Nicholas Ave
>> New York, NY 10032
>> (212)851-4765 (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>>
>> I am a Bayesian. When I see a multiple-choice question on a
>> test and I
>> don't
>> know the answer I say "eeney-meaney-miney-moe".
>>
>> Rose Friedman, Age 14
>>
>>
>>
>>
>>
>>
>>
>>
>> On May 13, 2011, at 10:46 AM, john herbert wrote:
>>
>> Dear Bioconductors,
>> I have a bunch of DeltaCT values for several tissues.
>> If I boxplot the
>> data,
>> it looks very similar to microarray data, a lot of
>> congestion around zero.
>>
>> Likewise, if I log2 the data, as in microarray, the
>> distributions looks
>> close to normal and like microarray data.
>>
>> Please see the image here for different plots;
>>
>> https://docs.google.com/leaf?id=0B9IUGsKecS4GNDc0OWVlNzEtZjE5Yi00Y2Q4LWI0M2MtMGFiNzZhMDU0YTFm&hl=en
>>
>> My question is data manipulation in this manner OK for
>> this type of data
>> and
>> will it effect/invalidate any unsupervised machine
>> learning/clustering?
>>
>> Can I quantile normalise the data and still do valid
>> clustering?
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>
> --
> ------------------------------------------------------------
> Richard A. Friedman, PhD
> Associate Research Scientist
> Herbert Irving Comprehensive Cancer Center
> Biomedical Informatics Shared Resource
> Lecturer
> Department of Biomedical Informatics
> Box 95, Room 130BB or P&S 1-420C
> Columbia University Medical Center
> 630 W. 168th St.
> New York, NY 10032
> (212)305-6901 (5-6901) (voice)
> friedman at cancercenter.columbia.edu
> http://cancercenter.columbia.edu/~friedman/
>
> "The last 250 pages of the last Harry Potter
> book took place in one day because alot
> happened in that day. All of Ulysses takes
> place in one day and nothing happened in that day."
> -Rose Friedman, age 11
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list