[BioC] HTqPCR problems

Thu Jun 28 23:11:19 CEST 2012

Hi Heidi,
getting there, hopefully if you can clarify the following issue, all will be well and good.

After yesterdays correspondence, I'm now producing nice plots, when I check the actual values being plotted, they dont match up
to the sample ID's. In fact, if I dont bother assigning groups, the sample ID's dont match to their respective gene CT values. I'm
worried there is some underlying problem with the data structure I'm not understanding.

I understand the code, its just the samples dont match the reported gene values in the csv file. 

for example

> head(groupID)
  Sample Treatment
1    S28       SMY
2    L20       LFY
3    M26       MMY
4     L1       LFR
5    L30       LFY
6    K13       KMO

>plotCtOverview(raw6, genes=featureNames(raw6)[1], group=groupID$Treatment,legend=FALSE, col=1:length(unique(groupID$Treatment)))

produces a nice plot of a tubulin gene across the groups, as you suggested yesterday . Yet if I look at the values, they dont match 
the CSV values for specific genes/samples I used. If I turn off groups, and look at samples without merging by group, I can see that the values dont match the appropriate gene being
displayed. My question is, where is the sample order being drawn from in the CSV file? Is there a simple check I can use to see that what is being plotted,
is what I think is being plotted? The group ID sample-Treatment is correct, and all the samples in the original CSV file are correct.

Is it possible that the package is assigning gene/sample ID in some other order than that I've supplied?
I just want to be sure that when HTqPCR pulls the sample ID and maps it to the appropriate gene/Group, some transformation is not happening.

Fluidigm suggests a particular order in loading samples and genes. These are numbered 1-48 (sample), and 1-48 (gene) for a 48.48 plate (and the same for a 96.96 plate). 
This is the order I supplied the sample IDs in the groupID file above. How do you map the raw csv output to gene/sample id?

Is there a way of checking that the sample/gene/group ID is correct?

as always, thanks in advance for your help

best

s
On Jun 27, 2012, at 3:27 PM, Heidi Dvinge wrote:

>> Hi Heidi,
>> you are correct, yes 48.48.
>> The example you provide below is exactly what I needed for clarification
>> for groups. I was trying to reverse engineer what you had done with the
>> original expression set package for microarrays, but from below, I can get
>> this to work now.
>> 
> Glad it works. Hopefully by the next BioConductor release I'll remember to
> clarify the plotCtOverview help file.
> 
>> Just to be clear, I have 5 48.48 plates. Should I normalize each
>> individually prior to combining, or should I reformat to a 2304x1 each,
>> combine, then normalize (not sure if you can do that or not)
>> 
> Hm, that's one of the questions I've also been asking myself, so I would
> be curious to hear what your results from this are.
> 
> If you suspect that there are some major factors influencing the 5 plates
> systematically, then normalising them in a 2304 x 5 object should
> (hopefully) correct for that. For example, they may have been run on
> different days, by different people, or perhaps there was a short power
> cut during the processing of one of them. This might be visible if you
> have for example a boxplot of Ct from all 48*5 samples, and you see blocks
> of them shifted up or down.
> 
> Obviously, this doesn't take care of normalisation between your samples
> within each plate though. If you suspect your samples to have some
> systematic variation that you need to account for, then you can normalise
> each plate individually (as a 48 x 48) object. Alternatively, you can try
> to combine within- and between-sample normalisation by taking all 48 x 240
> values at once.
> 
> In principle, you can split, reformat and the recombine the data in
> however many ways you like. Personally, with any sort of data I prefer to
> go with as little preprocessing as possible, since each additional step
> can potentially introduce its own biases into the data. So unless there
> are some obvious variation between your 5 plates, I'd probably stick with
> just normalisation between the samples, e.. using a 48 x 240 object.
> 
> Of course, you may have different preferences, or find out that a
> completely different approach is required for this particular data set.
> 
> \Heidi
> 
>> thanks again for your prompt responses!
>> 
>> best
>> 
>> s
>> 
>> On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote:
>> 
>>> Hi Simon,
>>> 
>>>> Thanks for the help Heidi,
>>>> but I'm still having troubles, your comments on the plotting helped me
>>>> solve the outputs. But if I want to just display some groups (for
>>>> example
>>>> the LO group in the example below), how do I associate a group with
>>>> multiple samples (ie biological reps)?
>>>> 
>>>> Currently I'm associating genes with samples  by reading in the file as
>>>> below
>>>> plate6=read.delim("plate6Sample.txt", header=FALSE)
>>>> #this is a file to associate sample ID with the genes in the biomark
>>>> data,
>>>> as currently HTqPCR does not seem to associate the sample info in the
>>>> Biomark output to the gene IDs
>>>> 
>>> Erm, no, it doesn't :-/
>>> 
>>>> samples=as.vector(t(plate6))
>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48,
>>>> n.data=48, samples=samples)
>>>> #now I have samples and genes similar to your example in the guide, but
>>>> I
>>>> want to associate samples to groups now. In the guide, you have an
>>>> example
>>>> where you have entire files as distinct samples, but in our runs, we
>>>> never
>>>> have that situation. I have a file which associates samples to groups,
>>>> which I read in...
>>>> 
>>>> groupID=read.csv("plate6key.csv")
>>>> 
>>>> but how do I associate the samples with their appropriate groups for
>>>> biological replicates with any of the functions in HtQPCR?
>>> 
>>> I'm afraid I'm slightly confused here (sorry, long day). Exactly how is
>>> your data formatted? I.e. are the columns either individual samples, or
>>> from files containing multiple samples? So for example for a single
>>> 48.48
>>> arrays, is your qPCRset object 2304 x 1 or 48 x 48?
>>> 
>>> From your readCtData command I'm guessing you have 48 x 48, i.e. all 48
>>> samples from your 1 array are in columns. In that case the 'groups'
>>> parameter in plotCtOverview will need to be a vector of length 48,
>>> indicating how you want the 48 columns in your qPCRset object to be
>>> grouped together.
>>> 
>>> Below is an example of (admittedly ugly) plots. I don't know if that's
>>> similar to your situation at all.
>>> 
>>> \Heidi
>>> 
>>>> # Reading in data
>>>> exPath <- system.file("exData", package = "HTqPCR")
>>>> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath, format
>>>> =
>>> "BioMark", n.features = 48, n.data = 48)
>>>> # Check sample names
>>>> head(sampleNames(raw1))
>>> [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6"
>>>> # Associate samples with (randomly chosen) groups
>>>> anno	<- data.frame(sampleID=sampleNames(raw1), Group=rep(c("A", "B",
>>> "C", "D"), times=c(4,24,5,15)))
>>>> head(anno)
>>> sampleID Group
>>> 1  Sample1     A
>>> 2  Sample2     A
>>> 3  Sample3     A
>>> 4  Sample4     A
>>> 5  Sample5     B
>>> 6  Sample6     B
>>>> # Plot the first gene - for each sample individually
>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE,
>>> col=1:nrow(anno))
>>>> # Plot the first gene - for each group
>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group,
>>> legend=FALSE, col=1:length(unique(anno$Group)))
>>>> # Plot the first gene, using group "A" as a control
>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], group=anno$Group,
>>> legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A")
>>> 
>>> 
>>> 
>>>> You recommend below using a vector, but I dont see how that helps me
>>>> associate the samples in the Expression set.
>>>> 
>>>> thanks again!
>>>> 
>>>> s
>>>> 
>>>> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote:
>>>> 
>>>>>> Hi,
>>>>>> I'm having some troubles selectively sub-setting, and graphing up
>>>>>> QPCR
>>>>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96 plates).
>>>>>> I'd
>>>>>> like to be able to visualize specific genes, with specific groups we
>>>>>> run
>>>>>> routinely on our Biomark system. Typical runs are across multiple
>>>>>> plates,
>>>>>> and have multiple biological replicates, and usually 2 or more
>>>>>> technical
>>>>>> replicates (although we are moving away from technical reps, as the
>>>>>> CVs
>>>>>> are so tight).
>>>>>> 
>>>>>> Can anyone help with this? Heidi?
>>>>>> 
>>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark", n.features=48,
>>>>>> n.data=48, samples=samples)
>>>>>> #Ive read the samples in from a separate file, as when you read it
>>>>>> in,
>>>>>> it
>>>>>> doesnt take the sample names supplied in the biomark output#
>>>>>> #Next, I want to plot genes of interest, with samples of interest,
>>>>>> and
>>>>>> I'm
>>>>>> having trouble getting an appropriate output#
>>>>>> 
>>>>>> g=featureNames(raw6)[1:2]
>>>>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment,
>>>>>> col=rainbow(5))
>>>>>> 
>>>>>> #This plots 1 gene across all 48 samples#
>>>>>> #but the legend doesnt behave, its placed on top of the plot, and I
>>>>>> cant
>>>>>> get it to display in a non-overlapping fashion#
>>>>>> #I've tried all sorts of things in par, but nothing seems to shift
>>>>>> the
>>>>>> legend's position#
>>>>>> 
>>>>> As the old saying goes, whenever you want a job done well, you'll have
>>>>> to
>>>>> do it yourself ;). In this case, the easiest thing is probably to use
>>>>> legend=FALSE in plotCtOverview, and then afterwards add it yourself in
>>>>> the
>>>>> desired location using legend(). That way, if you have a lot of
>>>>> different
>>>>> features or groups to display, you can also use the ncol parameter in
>>>>> legend to make several columns within the legend, such as 3x4 instead
>>>>> of
>>>>> the default 12x1.
>>>>> 
>>>>> Alternatively, you can use either xlim or ylim in plotCtOverview to
>>>>> add
>>>>> some empty space on the side where there's then room for the legend.
>>>>> 
>>>>>> #I now want to plot a subset of the samples for specific genes#
>>>>>>> LOY=subset(groupID,groupID$Treatment=="LO" | groupID$Treatment==
>>>>>>> "LFY")
>>>>>>> LOY
>>>>>> Sample Treatment
>>>>>> 2     L20       LFY
>>>>>> 5     L30       LFY
>>>>>> 7     L45        LO
>>>>>> 20    L40        LO
>>>>>> 27    L43        LO
>>>>>> 33    L29       LFY
>>>>>> 36    L38        LO
>>>>>> 40    L39        LO
>>>>>> 43    L23       LFY
>>>>>> 
>>>>>> 
>>>>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5))
>>>>>> Warning messages:
>>>>>> 1: In split.default(t(x), sample.split) :
>>>>>> data length is not a multiple of split variable
>>>>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced
>>>>>>> 
>>>>> 
>>>>> Does it make sense if you split by groups=LOY$Treatment? It looks like
>>>>> the
>>>>> object LOY itself is a data frame, rather than the expected vector.
>>>>> 
>>>>> Also, you may have to 'repeat' the col=rainbow() argument to fit your
>>>>> number of features.
>>>>> 
>>>>>> 
>>>>>> #it displays the two groups defined by treatment, but doesnt do so
>>>>>> nicely,
>>>>>> very skinny bars, and the legend doesnt reflect what its displaying#
>>>>>> #again, I've tried monkeying around with par, but not sure what
>>>>>> HTqPCR
>>>>>> is
>>>>>> calling to make the plots#
>>>>>> 
>>>>> If the bars are very skinny, it's probably because you're displaying a
>>>>> lot
>>>>> of features. Nothing much to do about that, except increasing the
>>>>> width
>>>>> or
>>>>> your plot :(.
>>>>> 
>>>>> \Heidi
>>>>> 
>>>>>> please help!
>>>>>> 
>>>>>> thanks
>>>>>> 
>>>>>> Simon.
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>