[BioC] Two populations on microarray

Ben Tupper btupper at bigelow.org
Mon Feb 13 19:53:28 CET 2012


Hi,

On Feb 12, 2012, at 8:54 PM, Naomi Altman wrote:

> Did you also remove all control spots?


Yes, at least I think so.  Here are the steps we take 

# from SpotTypes.txt
spotTypes <- structure(list(SpotType = c("gene", "empty", "landingLight", 
"printBuffer", "alienSpikes"), Name = c("*", "*empty*", "*landing*", 
"*buffer*", "*alien*"), ID = c("*", "*empty*", "*landing*", "*buffer*", 
"*alien*"), Color = c("black", "yellow", "orange", "blue", "purple"
)), .Names = c("SpotType", "Name", "ID", "Color"), class = "data.frame", row.names = c(NA, 
-5L))

# this is the weighting function
BE.weightsFlag <-  function(x){
   f <- ( ( x[,"F532 Median"] > 750 ) | ( x[,"F635 Median"] > 750 ) | x[,"Flags"] >= 0) 
   return(as.numeric(f))
}

# read in the .gpr files
RG <- read.maimages(targets,
   wt.fun = BE.weightsFlag, source = "genepix.median",
   other.columns = c(Rb = "B635.Median", Gb = "B532.Median"))
   
# flag the genes that match in the spotTypes$Name column
RG$genes$Status <- controlStatus(spotTypes, RG$genes, regexpcol = "Name")   

   Matching patterns for: Name 
   Found 21420 gene 
   Found 1440 empty 
   Found 60 landingLight 
   Found 40 printBuffer 
   Found 200 alienSpikes 


I wonder if doing the above step is the same as "removing" control spots.  I'm not sure how the functions handle Status.  Should I add a step like this to really remove the control spots?

isGene <- RG$genes$Status == "gene"

RG$weights <- RG$weights * isGene


Thanks!
Ben






> 
> --Naomi
> 
> 
> At 09:19 AM 2/7/2012, Ben Tupper wrote:
>> Hi,
>> 
>> On Jan 21, 2012, at 2:59 PM, Naomi Altman wrote:
>> 
>> > I agree with Gordon.
>> >
>> > I doubt that the double cloud has anything to do with differential expression.  There is something odd going on technically.  The usual types of normalization are not going to fix the problem.
>> 
>> Thanks for the assistance - we took up the suggestions that Gordon proposed.  We have successfully assigned weight = 0 to the problematic points.  I encouraged us to use a brute force identify-and-kill approach, but Joaquin's more nuanced inter-slide comparison approach prevailed.  The MA plots look great now but the subsequent between-array normalizations seem problematic, or at least the diagnostic plotDensities() graphics points to continuing issues.  This plot shows 4 diagnostic plots for one array ...
>> 
>> http://dl.dropbox.com/u/8433654/slide-52-MA-diagnostics.png
>> 
>> In the left column are shown the results of plotMA(MA,...) with zero.weights set to TRUE/FALSE so that we can show/hide the weight = 0 spots.
>> 
>> In the right column are shown the results of a slightly modified plotDensities(MA,...) where I have added a zero.weights argument to the original plotDensities() function.  The upper plot is identical to the output from the original plotDensities() function, while the lower plot simply removes the weight = 0 spots before computing the density distribution.  Because the MA-to-RG transformation in the original plotDensities() function doesn't take weights into account, it becomes difficult to use the function with our data to visually diagnose the effect the normalization functions.
>> 
>> The upper right plot leads us to believe that we have some serious issues.  But the lower right plot tells us that we are ok - obviously we like the lower right one better!
>> 
>> So, are we fooling ourselves by thinking the histogram at lower right is enough to tell us that we are good to go on to the next step?  If we are fooling ourselves, then what would you advise us to do instead?
>> 
>> Thanks so much!
>> Ben Tupper
>> 
>> 
>> 
>> 
>> 
>> >
>> > --Naomi
>> >
>> >
>> > At 12:03 AM 1/20/2012, Gordon K Smyth wrote:
>> >> Dear Joaquin,
>> >>
>> >> What I had in mind was that you would make a vector z which takes values TRUE or FALSE depending on whether each probe on the array belongs to group 1 or group 2 according to your MA plot.  Then
>> >>
>> >>  imageplot(z,layout,low="white",high="blue")
>> >>
>> >> There is no way for you normalize out this problem, and certainly not
>> >> within the limited capabilities of GenePix software.
>> >>
>> >> Best wishes
>> >> Gordon
>> >>
>> >> ---------------------------------------------
>> >> Professor Gordon K Smyth,
>> >> Bioinformatics Division,
>> >> Walter and Eliza Hall Institute of Medical Research,
>> >> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> >> smyth at wehi.edu.au
>> >> http://www.wehi.edu.au
>> >> http://www.statsci.org/smyth
>> >>
>> >>
>> >> On Thu, 19 Jan 2012, Joaquin Martinez wrote:
>> >>
>> >>> Dear Naomi, Gordon and Ben,
>> >>>
>> >>>
>> >>>
>> >>> Thank you for your replies to Ben Tupper's (and my) question.
>> >>>
>> >>>
>> >>>
>> >>> We are using spotted oligonucleotide microarrays containing probes for both
>> >>> host and virus genes. In our experiment we had cultures grown under high
>> >>> and low phosphate conditions, inoculated with 2 different viruses
>> >>> (separately) or kept virus-free, in triplicate. RNA purified from those
>> >>> cultures at different time points was fluorescently labeled (with Cy-dyes)
>> >>> and hybridized onto the microarray slides. You can see a flow chart of our
>> >>> experimental design here:
>> >>>
>> >>> http://dl.dropbox.com/u/8433654/design-concept.pdf
>> >>>
>> >>>
>> >>>
>> >>> One slide contains 2 samples which had different experimental treatments.
>> >>> Each sample was split into 3, labeled (dye swap) and hybridized onto 3
>> >>> different microarray slides in combination with another sample to allow
>> >>> technical replication.
>> >>>
>> >>>
>> >>>
>> >>> I quantified labeling efficiency prior to hybridizing the samples onto the
>> >>> microarray slide, for both dyes I got between 30 and 60 dye molecules per
>> >>> 1000 nt (what is the range indicated by the manufacturer for good
>> >>> labeling). Also we produced FB plots for the green and the red channels,
>> >>> both had similar z-range and saturation range, which we interpreted as a
>> >>> proof of good labeling (?). See example:
>> >>>
>> >>> http://dl.dropbox.com/u/8433654/R-G-imageplot.png
>> >>>
>> >>>
>> >>>
>> >>> Both MA clusters that we observe contain a mixture of both host and virus
>> >>> probes, ruling out that one complete set of probes failed. Naomi mentioned
>> >>> that the nondifferentially expressing genes should cluster around M=0, so
>> >>> does that mean that the top cluster corresponds to differentially expressed
>> >>> genes?
>> >>>
>> >>>
>> >>>
>> >>> We used GenePix Pro to scan and analyze the microarrays. Could we use the
>> >>> normalization function in the software (normalize the data in each image so
>> >>> that the mean of the median of ratios of all features is equal to 1) as an
>> >>> alternative to MA? Or would that simply hide the problem? And then do
>> >>> normalization between arrays using the quantile method?
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Joaquin
>> >>>
>> >>>
>> >>>
>> >>>>> From: Naomi Altman <naomi at stat.psu.edu>
>> >>>>> Date: January 18, 2012 9:56:45 AM EST
>> >>>>> To: Gordon K Smyth <smyth at wehi.EDU.AU>, Ben Tupper <btupper at bigelow.org>
>> >>>>> Cc: Bioconductor mailing list <bioconductor at r-project.org>
>> >>>>> Subject: Re: [BioC] Two populations on microarray
>> >>>>>
>> >>>>> Dear Ben,
>> >>>>> A typical MA plot has most of the points scattered around the line M=0.
>> >>>> Even if you have 2 populations of probes, the nondifferentially expressing
>> >>>> genes should be in that central ellipse.  (The lower cluster does look
>> >>>> somewhat like the typical MA plot for raw data.)  I suggest that you do
>> >>>> separate MA plots for each population of probes, to see if one set of
>> >>>> probes failed.  Or, as Gordon suggests, a population for which labelling
>> >>>> failed.
>> >>>>>
>> >>>>> --Naomi
>> >>>>>
>> >>>>>
>> >>>>> At 05:48 PM 1/14/2012, Gordon K Smyth wrote:
>> >>>>>> Dear Ben,
>> >>>>>>
>> >>>>>> Are you saying that you have deliberately designed two different
>> >>>> populations of probes onto your arrays?
>> >>>>>>
>> >>>>>> Your MA-plot suggests that there is substantial body of spots on the
>> >>>> array for which the green channel has failed, hence the 45-degree line at
>> >>>> the top of the plot.  These dots likely represent spots with a normal red
>> >>>> channel value but close to zero for green.  Normally this would have a
>> >>>> technical rather than biological cause.  An imageplot may help you identify
>> >>>> where the offending spots are on your array.
>> >>>>>>
>> >>>>>> On the other hand, if you have deliberately spotted your arrays with
>> >>>> two quite different populations of probes, then they probably need to be
>> >>>> analysed as separate arrays.
>> >>>>>>
>> >>>>>> Best wishes
>> >>>>>> Gordon
>> >>>>>>
>> >>>>>>> Date: Thu, 12 Jan 2012 14:28:36 -0500
>> >>>>>>> From: Ben Tupper <btupper at bigelow.org>
>> >>>>>>> To: bioconductor at r-project.org
>> >>>>>>> Subject: [BioC] Two populations on microarray
>> >>>>>>>
>> >>>>>>> Hello,
>> >>>>>>>
>> >>>>>>> By virtue of experiment design we have two populations to analyze on
>> >>>> each of a suite of Genepix microarrays.  You can see an example in an MA
>> >>>> plot here (generated using the excellent limma package) :
>> >>>>>>>
>> >>>>>>>       http://dl.dropbox.com/u/8433654/BE%20T46h%20slide%2052.png
>> >>>>>>>
>> >>>>>>> We have been following the steps in the limma user guide, and Ben
>> >>>> Bolstad's helpful notes http://tinyurl.com/7346mh9 All of the examples we
>> >>>> see appear to have just one population to contend with, which gives us an
>> >>>> inkling that we are being naive about our analysis.  We suspect that we'll
>> >>>> have to separate the two populations before normalization and analysis.
>> >>>> Are there any guides available for managing two populations like this?
>> >>>>>>>
>> >>>>>>> Thanks!
>> >>>>>>> Ben
>> >>>>>>>
>> >>>>
>> >>
>> >> ______________________________________________________________________
>> >> The information in this email is confidential and intended solely for the addressee.
>> >> You must not disclose, forward, print or use it without the permission of the sender.
>> >> ______________________________________________________________________
>> >>
>> >
>> >
>> >
>> 
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 180 McKown Point Rd. P.O. Box 475
>> West Boothbay Harbor, Maine   04575-0475
>> http://www.bigelow.org
> 
> 
> 

Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475 
http://www.bigelow.org



More information about the Bioconductor mailing list