[BioC] HT qPCR - error in scale rank invariant

Heidi Dvinge heidi at ebi.ac.uk
Tue Jan 25 20:23:08 CET 2011


Hi Andreia,

if your samples are indeed very different, then that's why a rank
invariant scaling fails. Quantile normalisation might be quite
conservative, but at least it seems to bring the C3 sample together with
the other C samples, based on your plots.

Depending on how adventurous you feel, you can also try some other
scaling/normalisation methods yourself. For example, this article
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718498/ recommends scaling to
 the mean of all Ct values when dealing with miRNA qPCR values.

Such a method might not work if you expect large overall differences in
expression level between your samples. However, it's easy to implement and
test this. Say for example that you want to use the geometric mean of all
expressed genes (Ct>35), and use the first sample as a reference, you
could do something like this:

# Load some example data
data(qPCRraw)

# Define plotting function (or just use the individual commands directly)
my.norm.method	<- function(q, Ct.max=35, ref=1)
{
    # Get the data
    data <- exprs(q)
    # For each column, calculate the geometric mean of Ct values<Ct.max
    geo.mean <- apply(data, 2, function(x) {
	xx <- log2(subset(x, x<Ct.max))
	2^mean(xx)})
    # Calculate the scaling factor
    geo.scale <- geo.mean/geo.mean[ref]
    # Adjust the data accordingly
    data.norm <- t(t(data) * geo.scale)
    # Return the normalised object
    exprs(q) <- data.norm
    q
}

# Normalise
q.norm	<- my.norm.method(qPCRraw)

# Plot raw versus normalised data
plot(exprs(qPCRraw), exprs(q.norm), col=rep(1:n.samples(q.norm),
each=n.wells(q.norm)))
# Followed by the usual QC and sanity check of your data...

If you decide to give that (or something similar) a go, I'd be interested
in hearing whether it works for your data or not.

Cheers
\Heidi




> Dear Heidi,
>
> thanks for your reply. Indeed I am comparing cell types which have huge
> differences between miRNAs profiles and unfortunately the qPCR assay only
> has one endogenous gene which is being affected by cell type and therefore
> dCt method is not adequate. I have tried quantile. The reason why I wanted
> to find another method is because has you can see in the distribution of
> Ct
> values, the cells S1 have many miRs which are not expressed and that I am
> analyzing as Ct=40. So these cells are very different and with quantile
> some
> differences will not pop up in the analyses because I am forcing it to
> have
> a distribution similar do the other cells. Still I think that is approach
> is
> conservative, given that some differences do appear as you can see in the
> files after quantile normalization. Implementing other methods that could
> deal with this problems of working with cell types  which have different
> behavior like my case and lacking endogenous genes to normalize could be a
> suggestion to your package.
> Kind regards,
> Andreia
>
> PS: in attach are two files with the correlations and data distribution.
>
> On Mon, Jan 24, 2011 at 11:11 PM, Heidi Dvinge <heidi at ebi.ac.uk> wrote:
>
>> Hello Andreia,
>>
>> I can reproduce the error you get if I say:
>>
>> > data(qPCRraw)
>> > temp <- normalizeCtData(qPCRraw, norm="scale.rankinvariant")
>> Scaling Ct values
>>        Using rank invariant genes: Gene1 Gene29
>>        Scaling factors: 1.00 1.06 1.00 1.03 1.00 1.00
>> # Select just the first genes so that Gene29 is excluded
>> > normalizeCtData(qPCRraw[1:10,], norm="scale.rankinvariant")
>> Error in smooth.spline(ref[i.set], data[i.set]) :
>>  need at least four unique 'x' values
>>
>> After looking into the code, the problem occur when there's only a
>> single
>> (or no) rank invariant genes between any individual sample and the
>> reference sample (the mean or median across all samples). At least two
>> rank-invariant genes are required between the reference and each sample.
>> I'll make a note of this in the help file.
>>
>> This means that a rank-invariant method is not going to be robust enough
>> for your normalisation. Instead, you'll have to go with ddCt or
>> quantile.
>> In the future there might be other options available in HTqPCR (e.g.
>> scale
>> by arithmetic or geometric mean) depending on demand.
>>
>> The likely cause of this is that your samples are quite different. Have
>> you tried investigating them with e.g. plotCtCor or clusterCt to see if
>> they group as expected, or if there's any marked difference in the
>> distribution of Ct values (plotCtDensity)? Even a relatively harsh
>> method
>> such as quantile normalisation might be suitable for you data.
>>
>> Cheers
>> \Heidi
>>
>>
>> > Dear Heidi,
>> >
>> > thanks for the quick reply,
>> >
>> > after traceback() I get
>> >
>> > traceback()
>> > 5: stop("need at least four unique 'x' values")
>> > 4: smooth.spline(ref[i.set], data[i.set])
>> > 3: FUN(newX[, i], ...)
>> > 2: apply(data, 2, normalize.invariantset, ref = ref.data)
>> > 1: normalizeCtData(raw.cat, norm = "scale.rank")
>> >
>> > information about the session
>> > sessionInfo()
>> > R version 2.11.1 (2010-05-31)
>> > i386-apple-darwin9.8.0
>> >
>> > locale:
>> > [1] C
>> >
>> > attached base packages:
>> > [1] stats     graphics  grDevices utils     datasets  methods   base
>> >
>> > other attached packages:
>> > [1] statmod_1.4.8      HTqPCR_1.2.0       limma_3.4.4
>> > RColorBrewer_1.0-2 Biobase_2.8.0
>> >
>> > loaded via a namespace (and not attached):
>> > [1] affy_1.26.1           affyio_1.16.0         gdata_2.7.2
>> > gplots_2.8.0          gtools_2.6.2          preprocessCore_1.10.0
>> >
>> > On Fri, Jan 21, 2011 at 5:41 PM, Heidi Dvinge <heidi at ebi.ac.uk> wrote:
>> >
>> >> Dear Andreia,
>> >>
>> >> > Dear all,
>> >> >
>> >> > I am analysing qPCR data from the Exiqon where I have one card per
>> >> sample,
>> >> > in each card I have one observation for each miRNA. I have in total
>> 8
>> >> > cards,
>> >> > 2 for treatment 1, 3 for treatment 2 and 3 for treatment 3. Each
>> card
>> >> has
>> >> > one endogenous gene, which I wouldn't like to use to normalize Ct
>> >> values
>> >> > because is being affected by the type of treatment. So I would like
>> to
>> >> use
>> >> > scale.rank.
>> >> > I am getting the following error:
>> >> >
>> >> > sr.norm <- normalizeCtData(raw.cat, norm = "scale.rank")
>> >> > Error in smooth.spline(ref[i.set], data[i.set]) :
>> >> >   need at least four unique 'x' values
>> >> >
>> >> It sounds like there aren't enough rank-invariant genes across your 8
>> >> cards. If that's the case, then this is admittedly not the most
>> useful
>> >> error message, and it should be changed. What does it say when you
>> run
>> >> traceback() following the error?
>> >>
>> >> The parameter "scale.rank.samples" in normalizeCtData() will let you
>> set
>> >> how many of the samples each gene has to be rank-invariant across in
>> >> order
>> >> to be excluded. Per default this is the number of samples-1. You can
>> try
>> >> lowering that number, although keeping in mind that the lower it is,
>> the
>> >> less robust your resulting rank-invariant genes are. If your samples
>> are
>> >> all highly variable across all genes, it might not be possible for
>> you
>> >> to
>> >> use this normalisation method.
>> >>
>> >> If this does not seem to be the problem, something else might be
>> going
>> >> on
>> >> with the function. In that case, please report back here and I can
>> >> perhaps
>> >> have a look at your data.
>> >>
>> >> I have been considering adding an additional parameter to
>> >> normalizeCtData,
>> >> so that genes just have to be rank-invariant within a certain
>> interval,
>> >> e.g. be located within -/+5 of each other on the ranked list. For
>> rather
>> >> low-throughput qPCR cards that could mess things up though.
>> >>
>> >> HTH
>> >> \Heidi
>> >>
>> >> > Does this mean I don't have enough replicates?
>> >> >
>> >> > thanks for the help
>> >> >
>> >> > Andreia
>> >> >
>> >> > --
>> >> > --------------------------------------------
>> >> > Andreia J. Amaral
>> >> > Unidade de Imunologia Clínica
>> >> > Instituto de Medicina Molecular
>> >> > Universidade de Lisboa
>> >> > email: andreiaamaral at fm.ul.pt
>> >> >           andreia.fonseca at gmail.com
>> >> >
>> >> >       [[alternative HTML version deleted]]
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> > Search the archives:
>> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > --------------------------------------------
>> > Andreia J. Amaral
>> > Unidade de Imunologia Clínica
>> > Instituto de Medicina Molecular
>> > Universidade de Lisboa
>> > email: andreiaamaral at fm.ul.pt
>> >           andreia.fonseca at gmail.com
>> >
>>
>>
>>
>
>
> --
> --------------------------------------------
> Andreia J. Amaral
> Unidade de Imunologia Clínica
> Instituto de Medicina Molecular
> Universidade de Lisboa
> email: andreiaamaral at fm.ul.pt
>           andreia.fonseca at gmail.com
>



More information about the Bioconductor mailing list