[BioC] HTqPCR problems
cfischer
cfischer at molgen.mpg.de
Sun Aug 26 14:40:07 CEST 2012
Hi Simon and Heidi,
> Hi Simon,
>
>> Great,
>> thanks Heidi. This did the job. Quick question, can you make
> scatterplots
>> instead of barcharts for any of the plot functions? I've looked at
>> plotCtOverview, and I dont see a way to specify the chart type. Its
>> helpful to have some flexibility here, so you can see how tight the
> spread
>> is (personal preference I know, as you allow CI to be displayed, I just
>> prefer to see the actual data)
>>
> All the functions available in HTqPCR are listed under
> ls("package:HTqPCR").
>
> For plotCtOverview there isn't any option for making scatterplots
instead
> of barplots (not sure how this would look?). However, you can of course
> add the actual Ct values as points on the plot, by using the points()
> function after calling plotCtOverview.
>
> In general, if you find that the plotting functions in HTqPCR differ too
> much from the kind of figure you want, then rather than trying to modify
> the plots extensively, it's probably easier to just construct your own
> plots from scratch. You can always extract the actual Ct values using
> either exprs() or getCt(), and unlike some microarray or sequencing
data,
> qPCR data sets are quite small and easy to work with 'manually'.
>
To plot data elsewhere (e.g. ggplot2, Excel etc.) that was calculated and
visualized using HTqPCR I use the following method.
To get all variables created from a function call (e.g. plotCtOverview())
use trace(). Afterwards output the variables of interest. Here is an
example:
> library(HTqPCR)
> path <- system.file("exData", package = "HTqPCR")
> files <- read.delim(file.path(path, "files.txt"))
> raw <- readCtData(files = files$File, path = path)
> trace(plotCtOverview, exit = function().last_env <<- parent.frame())
Tracing function "plotCtOverview" in package "HTqPCR"
[1] "plotCtOverview"
# Reduce featureNames to Gene1 - Gene5 and plot
> g <- featureNames(raw)[1:5]
> plotCtOverview(raw, genes = g, groups = files$Treatment, conf.int =
TRUE)
Tracing plotCtOverview(raw, genes = g, groups = files$Treatment, conf.int
= TRUE) on exit
# lists all variables
> ls(.last_env)
[1] "M" "SD" "bar.names" "calibrator"
[5] "cards" "col" "conf.int" "data"
[9] "data.all" "data.feat" "data.samp" "feature.split"
[13] "genes" "groups" "i" "index"
[17] "legend" "q" "replicates" "sample.split"
[21] "x.pos" "ylab"
# "M" and "SD" are needed for plotting
> get("M", .last_env)
Gene1 Gene2 Gene3 Gene4 Gene5
M.Control 11.70353 33.77561 29.42980 26.93328 24.40349
M.LongStarve 11.27593 33.04012 28.45492 28.88418 25.93422
M.Starve 10.70192 32.03267 27.94196 28.43661 25.38847
> get("SD", .last_env)
Gene1 Gene2 Gene3 Gene4 Gene5
SD.Control 0.1805192 0.8406226 2.4835075 0.05839492 0.8950952
SD.LongStarve 0.4714122 3.0073646 0.7928997 1.60725021 1.5908641
SD.Starve 0.2121190 3.1046433 6.3861900 0.82644244 0.8872954
> untrace(plotCtOverview)
Untracing function "plotCtOverview" in package "HTqPCR"
The actual Cts ("M") and the standard deviations ("SD") may be plotted
elsewhere. If more genes are of interest dump "M" and "SD" to a file e.g.
as follows.
> sink("outfile.txt")
> get("M", .last_env)
> get("SD", .last_env)
> sink()
#!/bin/bash
cat outfile.txt
Gene1 Gene2 Gene3 Gene4 Gene5
M.Control 11.70353 33.77561 29.42980 26.93328 24.40349
M.LongStarve 11.27593 33.04012 28.45492 28.88418 25.93422
M.Starve 10.70192 32.03267 27.94196 28.43661 25.38847
Gene1 Gene2 Gene3 Gene4 Gene5
SD.Control 0.1805192 0.8406226 2.4835075 0.05839492 0.8950952
SD.LongStarve 0.4714122 3.0073646 0.7928997 1.60725021 1.5908641
SD.Starve 0.2121190 3.1046433 6.3861900 0.82644244 0.8872954
Regards and thanks Heidi for the nice package!
Coni
> The plots in HTqPCR are all based on simple plotting functions from the
> core graphics package, i.e. plot(), barplot(), boxplot().
>
> Best
> \Heidi
>
>> thanks
>>
>> s
>> On Jul 6, 2012, at 1:07 AM, Heidi Dvinge wrote:
>>
>>> Hi Simon,
>>>
>>>> Hi Heidi,
>>>> I've a followup from the question from listed below. I've started to
>>>> merge
>>>> multiple plates together using rbind as you suggested. I've identical
>>>> gene
>>>> order per plate, but different biological samples belonging to
>>>> different
>>>> groups per plate. I can manipulate these using your graphing
functions
>>>> on
>>>> a per plate basis, but I'm unclear how to address comparisons with
>>>> regards
>>>> to groups once the plates are merged.
>>>>
>>>> For example, I have 5 text files which tie sample ID to specific
> groups
>>>> of
>>>> interest. After importing and merging the 5 separate Biomark files by
>>>> rbind, my merged object is 48x240. How do I then contrast the
> different
>>>> groups? I'm clear on how to do this in a single plate (using one of
> the
>>>> 5
>>>> individual text files), but confused with regards to the process
after
>>>> merging. Should I have a vector with 240 identifiers for the merged
>>>> files
>>>> which is in the same order the merged plates are? (top to bottom
> (row),
>>>> and left to right? (by column).
>>>>
>>> Yes (presuming I've understood your description correct). You can
> either
>>> also just combine your individual text files with samples
descriptions,
>>> or
>>> create a whole new object describing each of your samples, as long as
>>> the
>>> order of your descriptions are the same as the columns in your qPCRset
>>> object.
>>>
>>> Alternatively, please note that as of version 1.9.0 (AFAIR) qPCRsets
> now
>>> inherit from class ExpressionSet, i.e. they have a slot called
> phenoData
>>> (an AnnotatedDataFrame) similar to what objects for microarray
analysis
>>> have. E.g.
>>>
>>>> data(qPCRraw)
>>>> phenoData(qPCRraw)
>>> An object of class "AnnotatedDataFrame"
>>> sampleNames: sample1 sample2 ... sample6 (6 total)
>>> varLabels: sample
>>> varMetadata: labelDescription
>>>> pData(qPCRraw)
>>> sample
>>> sample1 1
>>> sample2 2
>>> sample3 3
>>> sample4 4
>>> sample5 5
>>> sample6 6
>>>> pData(qPCRraw)$Genotype <- c("A", "B", "A", "A", "B", "B")
>>>> pData(qPCRraw)
>>> sample Genotype
>>> sample1 1 A
>>> sample2 2 B
>>> sample3 3 A
>>> sample4 4 A
>>> sample5 5 B
>>> sample6 6 B
>>>
>>> You can use this to store information about samples within the qPCRset
>>> object itself, rather than in 'external' objects. There's more info
>>> about
>>> this in the general help files for:
>>>
>>> ?AnnotatedDataFrame
>>> ?ExpressionSet
>>>
>>> \Heidi
>>>
>>>
>>>> thanks
>>>>
>>>> Simon
>>>> On Jun 27, 2012, at 3:27 PM, Heidi Dvinge wrote:
>>>>
>>>>>> Hi Heidi,
>>>>>> you are correct, yes 48.48.
>>>>>> The example you provide below is exactly what I needed for
>>>>>> clarification
>>>>>> for groups. I was trying to reverse engineer what you had done with
>>>>>> the
>>>>>> original expression set package for microarrays, but from below, I
>>>>>> can
>>>>>> get
>>>>>> this to work now.
>>>>>>
>>>>> Glad it works. Hopefully by the next BioConductor release I'll
>>>>> remember
>>>>> to
>>>>> clarify the plotCtOverview help file.
>>>>>
>>>>>> Just to be clear, I have 5 48.48 plates. Should I normalize each
>>>>>> individually prior to combining, or should I reformat to a 2304x1
>>>>>> each,
>>>>>> combine, then normalize (not sure if you can do that or not)
>>>>>>
>>>>> Hm, that's one of the questions I've also been asking myself, so I
>>>>> would
>>>>> be curious to hear what your results from this are.
>>>>>
>>>>> If you suspect that there are some major factors influencing the 5
>>>>> plates
>>>>> systematically, then normalising them in a 2304 x 5 object should
>>>>> (hopefully) correct for that. For example, they may have been run on
>>>>> different days, by different people, or perhaps there was a short
>>>>> power
>>>>> cut during the processing of one of them. This might be visible if
> you
>>>>> have for example a boxplot of Ct from all 48*5 samples, and you see
>>>>> blocks
>>>>> of them shifted up or down.
>>>>>
>>>>> Obviously, this doesn't take care of normalisation between your
>>>>> samples
>>>>> within each plate though. If you suspect your samples to have some
>>>>> systematic variation that you need to account for, then you can
>>>>> normalise
>>>>> each plate individually (as a 48 x 48) object. Alternatively, you
can
>>>>> try
>>>>> to combine within- and between-sample normalisation by taking all 48
> x
>>>>> 240
>>>>> values at once.
>>>>>
>>>>> In principle, you can split, reformat and the recombine the data in
>>>>> however many ways you like. Personally, with any sort of data I
> prefer
>>>>> to
>>>>> go with as little preprocessing as possible, since each additional
>>>>> step
>>>>> can potentially introduce its own biases into the data. So unless
>>>>> there
>>>>> are some obvious variation between your 5 plates, I'd probably stick
>>>>> with
>>>>> just normalisation between the samples, e.. using a 48 x 240 object.
>>>>>
>>>>> Of course, you may have different preferences, or find out that a
>>>>> completely different approach is required for this particular data
>>>>> set.
>>>>>
>>>>> \Heidi
>>>>>
>>>>>> thanks again for your prompt responses!
>>>>>>
>>>>>> best
>>>>>>
>>>>>> s
>>>>>>
>>>>>> On Jun 27, 2012, at 2:26 PM, Heidi Dvinge wrote:
>>>>>>
>>>>>>> Hi Simon,
>>>>>>>
>>>>>>>> Thanks for the help Heidi,
>>>>>>>> but I'm still having troubles, your comments on the plotting
> helped
>>>>>>>> me
>>>>>>>> solve the outputs. But if I want to just display some groups (for
>>>>>>>> example
>>>>>>>> the LO group in the example below), how do I associate a group
> with
>>>>>>>> multiple samples (ie biological reps)?
>>>>>>>>
>>>>>>>> Currently I'm associating genes with samples by reading in the
>>>>>>>> file
>>>>>>>> as
>>>>>>>> below
>>>>>>>> plate6=read.delim("plate6Sample.txt", header=FALSE)
>>>>>>>> #this is a file to associate sample ID with the genes in the
>>>>>>>> biomark
>>>>>>>> data,
>>>>>>>> as currently HTqPCR does not seem to associate the sample info in
>>>>>>>> the
>>>>>>>> Biomark output to the gene IDs
>>>>>>>>
>>>>>>> Erm, no, it doesn't :-/
>>>>>>>
>>>>>>>> samples=as.vector(t(plate6))
>>>>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark",
> n.features=48,
>>>>>>>> n.data=48, samples=samples)
>>>>>>>> #now I have samples and genes similar to your example in the
> guide,
>>>>>>>> but
>>>>>>>> I
>>>>>>>> want to associate samples to groups now. In the guide, you have
an
>>>>>>>> example
>>>>>>>> where you have entire files as distinct samples, but in our runs,
>>>>>>>> we
>>>>>>>> never
>>>>>>>> have that situation. I have a file which associates samples to
>>>>>>>> groups,
>>>>>>>> which I read in...
>>>>>>>>
>>>>>>>> groupID=read.csv("plate6key.csv")
>>>>>>>>
>>>>>>>> but how do I associate the samples with their appropriate groups
>>>>>>>> for
>>>>>>>> biological replicates with any of the functions in HtQPCR?
>>>>>>>
>>>>>>> I'm afraid I'm slightly confused here (sorry, long day). Exactly
> how
>>>>>>> is
>>>>>>> your data formatted? I.e. are the columns either individual
> samples,
>>>>>>> or
>>>>>>> from files containing multiple samples? So for example for a
single
>>>>>>> 48.48
>>>>>>> arrays, is your qPCRset object 2304 x 1 or 48 x 48?
>>>>>>>
>>>>>>> From your readCtData command I'm guessing you have 48 x 48, i.e.
> all
>>>>>>> 48
>>>>>>> samples from your 1 array are in columns. In that case the
'groups'
>>>>>>> parameter in plotCtOverview will need to be a vector of length 48,
>>>>>>> indicating how you want the 48 columns in your qPCRset object to
be
>>>>>>> grouped together.
>>>>>>>
>>>>>>> Below is an example of (admittedly ugly) plots. I don't know if
>>>>>>> that's
>>>>>>> similar to your situation at all.
>>>>>>>
>>>>>>> \Heidi
>>>>>>>
>>>>>>>> # Reading in data
>>>>>>>> exPath <- system.file("exData", package = "HTqPCR")
>>>>>>>> raw1 <- readCtData(files = "BioMark_sample.csv", path = exPath,
>>>>>>>> format
>>>>>>>> =
>>>>>>> "BioMark", n.features = 48, n.data = 48)
>>>>>>>> # Check sample names
>>>>>>>> head(sampleNames(raw1))
>>>>>>> [1] "Sample1" "Sample2" "Sample3" "Sample4" "Sample5" "Sample6"
>>>>>>>> # Associate samples with (randomly chosen) groups
>>>>>>>> anno <- data.frame(sampleID=sampleNames(raw1),
>>>>>>>> Group=rep(c("A", "B",
>>>>>>> "C", "D"), times=c(4,24,5,15)))
>>>>>>>> head(anno)
>>>>>>> sampleID Group
>>>>>>> 1 Sample1 A
>>>>>>> 2 Sample2 A
>>>>>>> 3 Sample3 A
>>>>>>> 4 Sample4 A
>>>>>>> 5 Sample5 B
>>>>>>> 6 Sample6 B
>>>>>>>> # Plot the first gene - for each sample individually
>>>>>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1], legend=FALSE,
>>>>>>> col=1:nrow(anno))
>>>>>>>> # Plot the first gene - for each group
>>>>>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1],
> group=anno$Group,
>>>>>>> legend=FALSE, col=1:length(unique(anno$Group)))
>>>>>>>> # Plot the first gene, using group "A" as a control
>>>>>>>> plotCtOverview(raw1, genes=featureNames(raw1)[1],
> group=anno$Group,
>>>>>>> legend=FALSE, col=1:length(unique(anno$Group)), calibrator="A")
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> You recommend below using a vector, but I dont see how that helps
>>>>>>>> me
>>>>>>>> associate the samples in the Expression set.
>>>>>>>>
>>>>>>>> thanks again!
>>>>>>>>
>>>>>>>> s
>>>>>>>>
>>>>>>>> On Jun 26, 2012, at 12:48 PM, Heidi Dvinge wrote:
>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> I'm having some troubles selectively sub-setting, and graphing
> up
>>>>>>>>>> QPCR
>>>>>>>>>> data within HTqPCR for Biomark plates (both 48.48 and 96.96
>>>>>>>>>> plates).
>>>>>>>>>> I'd
>>>>>>>>>> like to be able to visualize specific genes, with specific
> groups
>>>>>>>>>> we
>>>>>>>>>> run
>>>>>>>>>> routinely on our Biomark system. Typical runs are across
> multiple
>>>>>>>>>> plates,
>>>>>>>>>> and have multiple biological replicates, and usually 2 or more
>>>>>>>>>> technical
>>>>>>>>>> replicates (although we are moving away from technical reps, as
>>>>>>>>>> the
>>>>>>>>>> CVs
>>>>>>>>>> are so tight).
>>>>>>>>>>
>>>>>>>>>> Can anyone help with this? Heidi?
>>>>>>>>>>
>>>>>>>>>> raw6=readCtData(files="Chip6.csv", format="BioMark",
>>>>>>>>>> n.features=48,
>>>>>>>>>> n.data=48, samples=samples)
>>>>>>>>>> #Ive read the samples in from a separate file, as when you read
>>>>>>>>>> it
>>>>>>>>>> in,
>>>>>>>>>> it
>>>>>>>>>> doesnt take the sample names supplied in the biomark output#
>>>>>>>>>> #Next, I want to plot genes of interest, with samples of
>>>>>>>>>> interest,
>>>>>>>>>> and
>>>>>>>>>> I'm
>>>>>>>>>> having trouble getting an appropriate output#
>>>>>>>>>>
>>>>>>>>>> g=featureNames(raw6)[1:2]
>>>>>>>>>> plotCtOverview(raw6, genes=g, groups=groupID$Treatment,
>>>>>>>>>> col=rainbow(5))
>>>>>>>>>>
>>>>>>>>>> #This plots 1 gene across all 48 samples#
>>>>>>>>>> #but the legend doesnt behave, its placed on top of the plot,
> and
>>>>>>>>>> I
>>>>>>>>>> cant
>>>>>>>>>> get it to display in a non-overlapping fashion#
>>>>>>>>>> #I've tried all sorts of things in par, but nothing seems to
>>>>>>>>>> shift
>>>>>>>>>> the
>>>>>>>>>> legend's position#
>>>>>>>>>>
>>>>>>>>> As the old saying goes, whenever you want a job done well,
you'll
>>>>>>>>> have
>>>>>>>>> to
>>>>>>>>> do it yourself ;). In this case, the easiest thing is probably
to
>>>>>>>>> use
>>>>>>>>> legend=FALSE in plotCtOverview, and then afterwards add it
>>>>>>>>> yourself
>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> desired location using legend(). That way, if you have a lot of
>>>>>>>>> different
>>>>>>>>> features or groups to display, you can also use the ncol
> parameter
>>>>>>>>> in
>>>>>>>>> legend to make several columns within the legend, such as 3x4
>>>>>>>>> instead
>>>>>>>>> of
>>>>>>>>> the default 12x1.
>>>>>>>>>
>>>>>>>>> Alternatively, you can use either xlim or ylim in plotCtOverview
>>>>>>>>> to
>>>>>>>>> add
>>>>>>>>> some empty space on the side where there's then room for the
>>>>>>>>> legend.
>>>>>>>>>
>>>>>>>>>> #I now want to plot a subset of the samples for specific genes#
>>>>>>>>>>> LOY=subset(groupID,groupID$Treatment=="LO" |
> groupID$Treatment==
>>>>>>>>>>> "LFY")
>>>>>>>>>>> LOY
>>>>>>>>>> Sample Treatment
>>>>>>>>>> 2 L20 LFY
>>>>>>>>>> 5 L30 LFY
>>>>>>>>>> 7 L45 LO
>>>>>>>>>> 20 L40 LO
>>>>>>>>>> 27 L43 LO
>>>>>>>>>> 33 L29 LFY
>>>>>>>>>> 36 L38 LO
>>>>>>>>>> 40 L39 LO
>>>>>>>>>> 43 L23 LFY
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> plotCtOverview(raw6, genes=g, groups=LOY, col=rainbow(5))
>>>>>>>>>> Warning messages:
>>>>>>>>>> 1: In split.default(t(x), sample.split) :
>>>>>>>>>> data length is not a multiple of split variable
>>>>>>>>>> 2: In qt(p, df, lower.tail, log.p) : NaNs produced
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does it make sense if you split by groups=LOY$Treatment? It
looks
>>>>>>>>> like
>>>>>>>>> the
>>>>>>>>> object LOY itself is a data frame, rather than the expected
>>>>>>>>> vector.
>>>>>>>>>
>>>>>>>>> Also, you may have to 'repeat' the col=rainbow() argument to fit
>>>>>>>>> your
>>>>>>>>> number of features.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> #it displays the two groups defined by treatment, but doesnt do
>>>>>>>>>> so
>>>>>>>>>> nicely,
>>>>>>>>>> very skinny bars, and the legend doesnt reflect what its
>>>>>>>>>> displaying#
>>>>>>>>>> #again, I've tried monkeying around with par, but not sure what
>>>>>>>>>> HTqPCR
>>>>>>>>>> is
>>>>>>>>>> calling to make the plots#
>>>>>>>>>>
>>>>>>>>> If the bars are very skinny, it's probably because you're
>>>>>>>>> displaying
>>>>>>>>> a
>>>>>>>>> lot
>>>>>>>>> of features. Nothing much to do about that, except increasing
the
>>>>>>>>> width
>>>>>>>>> or
>>>>>>>>> your plot :(.
>>>>>>>>>
>>>>>>>>> \Heidi
>>>>>>>>>
>>>>>>>>>> please help!
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>>
>>>>>>>>>> Simon.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>> Search the archives:
>>>>>>>>>>
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
More information about the Bioconductor
mailing list