[BioC] HTqPCR to analyze Fluidigm 96.96 Dynamic Array data

Wed Jun 16 23:44:29 CEST 2010

Hello Vicencio,

> Dear list,
>
> I'm having some difficulties in using HTqPCR to analyze qPCR data obtained
> using the Biomark Fluidigm 96.96 array.
>
interesting question. For a while I was toying with the idea of
incorporating functions specifically for Fluidigm data into HTqPCR. I
never went through with it though, since each individual Fluidigm array
can have its own design, so it's not necessarily common across samples the
way it is for e.g. ABI and Roche cards. Nevertheless, it should be
possible to use HTqPCR for Fluidigm data.

> With the Fluidigm chips, one can measure expression of 96 genes in 96
> samples on one plate, i.e. 9216 PCRs per plate (see
> http://www.fluidigm.com/products/biomark-chips.html for details).
>
> In my experiment, I use 9 such plates. On each plate I have 88 different
> experimental samples, with different samples on each plate, totalling 792
> unique experimental samples (associated with specific experimental
> conditions). On each plate, I also have 8 standard samples that are the
> same across all plates (1 NTC, 1 cDNA mix +RT, 1 cDNA mix -RT, 5 samples
> of a dilution series).
>
>
> I use 32 different genes (features), each replicated 3 times, in the same
> order on each plate.
>
> Each original data file (as exported by the FLuidigm software) has data on
> one plate, i.e. 9216 rows with one PCR per row, with columns for sample
> name, feature name, Ct, quality calls, etc.
> I managed to read in the 9 data files (from 9 plates) into one qPCRset
> object:
> An object of class "qPCRset"
> Size:  96 features, 864 samples
> Feature types:           Reference, Test
> Feature names:           1.BGRP 1.BGRP 1.BGRP ...
> Feature classes:
> Feature categories:      OK, Undetermined
> Sample names:            A1.1h A1.6a A1.11h ...
>
> Is this a good way to structure my data? Or would it be better to create 9
> qPCRset objects (1 for each plate)? Before spending more effort continuing
> this approach I'd appreciate your opinion on whether this is the way
> forward.
>
It depends a bit on how clean your data is, and how you want to preprocess
it. If you suspect there are any array-specific effects at all, you'll
probably want to normalise your 9 plates separately, i.e. have them in a
qPCRset object with 96x96 rows and 9 columns.

Do you have your data in a single or 9 files? Either way, you can create
such a qPCRset. Or possibly, if you want to use the object you already
have loaded into R, you can split it up using something like this
(untested, and unelegant):

q <- your_qPCRset
# To get the columns originating from the same array
start <- seq(1, 9*96, 96)
# Make a list of 9 individual 96x96 qPCRset objects
q.list <- list()
for (i in seq_along(start)) {
	q.list[[i]] <- q[,start[i]:(start[i]+95)]
}
# Convert each list entry from 96x96 to 9216x1 dimension qPCRset
for (i in seq_along(q.list)) {
	temp <- list()
	for (j in 1:96)
		temp[[j]] <- q.list[[i]][,j]
	q.list[[i]] <- do.call("rbind", temp)
}
# Join them all together into 9216x9
q.new <- do.call("cbind", q.list)

A bit of data exploration is probably required to check whether you have
any particular biases that needs correcting in your data. Based on the
qPCRset object you have now, you can e.g. try clustering your data using
clusterCt(), and see if the samples, especially the controls, cluster
together by sample type or based on what array they were run on. Also,
what's the correlation between samples like (plotCtCor)?

By the time you get to the actual statistical testing you'd want your data
in a format like the one you have now, i.e. 1 row per gene (3 rows per
gene in your case due to your replicates) and 1 column per sample. If your
start with 9216 rows x 9 columns for doing the normalisation, you can
reformat the data afterwards using the changeCtLayout function.

>
> Among other things, I would like to do the following:
> 1. Check for spatial effects. When I use plotCtCard, it only plots one
> sample at a time, even though I have 96 samples on each plate. Is it
> possible to plot my 96 samples x 96 features? How can I specify this kind
> of layout?

To plot each array separately, you'd need to have each array in a single
column!
Note though, that the plotCtCard is optimised for the standard size
rectangular well plate. Aesthetically speaking it might not look so nice
for a 96x96 square array. I started making a plotCtArray function for
Fluidigm data at some point; let me know if you're keen to be a guinea
pig.

> 2. Control for plate-specific effects. I have the same 8 standard samples
> on each plate (for all genes), and would like to use these repeated
> measurements to 'normalize' all other data across plates. However, I'm
> having a hard time even accessing and plotting the data.

For using these 8 control genes for normalisation you can use the function
normalizeCtData(q, norm = "deltaCt", deltaCt.genes) where deltaCt.genes is
a vector of the gene names you want to use as standard. Note that these 8
gene names must appear exactly as they are in featureNames(q).

> 3. Speficiy technical replicates. Each sample has been run on 32 genes in
> triplicate. Each feature name is represented 3 times (once for each
> technical rep). How can I specify that my 96 features are grouped per 3?

You don't have to specify technical replicates directly anywhere within
your qPCRset objects. Several functions, such as ttestCtData has a
parameter "replicates" which can be set to TRUE if you want to consider
replicates. If so the function(s) combine data across genes that have
identical featureNames.

A small note here: featureNames don't have to be unique, in fact it's
often easier for downstream analysis if identical genes are named the
same, and not e.g. gene1_rep1, gene1_rep2 etc. The way to tell them apart
is then using the featurePos information. This corresponds to the location
of each gene on the array, or pos1...pos9216 if not positional information
is supplied to readCtData. The output from e.g. ttestCtData will report
both the featureNames and featurePos, so even for replicates you can
always trace each result back to the original value.

> 4.  Add information about my experimental design. My 792 experimental
> samples were obtained in a full factorial design with several biological
> replicates per treatment. Is it possible to add extra data to my qPCR set
> object? E.g. a matrix containing, per sample, information on sample name,
> value for factor 1, value for factor 2, etc.?
>
I'm afraid there's no "optional" slot in qPCRset objects where users can
add additional data, whether that's data frames, matrices or lists.

HTH
\Heidi

> I do understand that this package was not developed specifically for
> dealing with data from these Fluidigm chips, but I haven't found any such
> package and as far as I know HTqPCR is the best package around for
> analysis of high-throughput qPCR data.
>
> I hope someone can help me out a little bit. I'm new to R, but I'm not
> asking you to do my work for me, just some directions to help me do it
> myself. Thanks!
>
> Cheers,
> Vicencio
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>