[BioC] Combining data from scans at different intensities

Wed Feb 14 05:38:25 CET 2007

Henrik Bengtsson <hb at ...> writes:

> 
> Hi.
> 
> On 2/13/07, John Fowler <fowlerj at ...> wrote:
> > Hello,
> >
> > I would like to use data extracted from images scanned at 3 different
> > intensities in our GenePix scanner.  There are a couple of papers
> > that I could find (Lyng et al 04, Piepho et al 06) that describe
> > methods to combine these data and thus help deal with problems of
> > saturation and signals across the dynamic range of the scanner.
> >
> > I looked for a way to do this in bioconductor, and found a post from
> > Dr. Henrik Bengtsson, indicating that this was possible using the
> > aroma.light package in bioconductor.  However, he indicated that this
> > should be done with data from scans in which the laser intensity =was
> > not changed=.
> >
> > Unfortunately, my scans used two different laser intensities.
> 
> So, what was your settings for the three scans?  If two scans have the
> same laser setting, how does the third scan differ?  Different PMT
> settings?
> 
> >
> > Does this invalidate using aroma.light for this purpose?  Is there
> > any other Bioconductor package that could deal with my (apparently
> > incorrectly obtained) data?
> 
> What we observed from scanning at different sensitivity (=PMT) levels
> was that the scanner adds an offset to the signals and that this
> offset is independent of the PMT setting.  We also observed that this
> offset is more or less constant across arrays (also roughly between
> channels), indicating that the offset is added either in the PMT
> (photomultiplier type) or more likely in the analogue-to-digital
> electronics just after the PMT.  We observed this in both of the
> scanners investigated, Axon GenePix 4000A and Agilent G2505A.
> 
> The multiscan calibration model is applied to each channel separately.
> Let c={R,G} be the two channels, and let e_c be the offset in channel
> c.  Say you do multiple scans k=1,...,K.  Then y_{c,i}^(k) denotes the
> probe signal in channel c for probe i and scan k.  Let the unknown
> amount of hybridized sequence in this probe is denoted by x_{c,i},
> which is independent of scan k. To be really precise here, x_{c,i} is
> the amount of light emitted from probe i entering the PMT.  We
> proposed the model:
> 
>  y_{c,i}^(k) = a_c^(k) + b_c^(k)*x_{c,i} + eps_{c,i}^(k)
>                 \approx e_c + b_c^(k)*x_{c,i} + eps_{c,i}^(k)  (*)
> 
> where eps_{c,i}^(k) is zero-mean noise.  By do multiscan at various
> *PMT settings*, we can  identify e_c and all of the b_c^(k). Even
> better, we get a good estimate of x_{c,i}, the amount of light
> entering the PMT tube, so in the end of the day we control for effects
> in the PMT and the electronics afterwards.  We strongly believe this
> is a good model for those effects.
> 
> Now, if you adjust the laser power, you effectively adjust the amount
> of light being emitted from each probe too, that is, you can no longer
> assume x_{c,i} being constant, but you have x_{c,i}^{m} where
> m=1,...,M is the different *laser levels*.  You may provide a similar
> model to (*) for laser-adjusted scans, e.g.
> 
>  x_{c,i}^(m) \approx d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)  (**)
> 
> where now z_{c,i} is the amount of labels on the hybridized target on
> probe i ,and x_{c,i}^(m) is the amount of light emitted by this probe
> at laser level m.  One open question is if "laser offset" d_c is
> constant or if it depends on m too.
> 
> Now, if (**) is true, when combining (*) and (**), which are both so
> called _affine_ functions, you will get another affine function:
> 
>  y_{c,i}^(k) = e_c + b_c^(k)*(d_c + g_c^(m)*z_{c,i} + xi_{c,i}^(m)) +
> eps_{c,i}^(k)
>                 = e_c + d_c*b_c^(k) + h_c^(k,m)*z_{c,i} + nu_{c,i}^(k,m)  (***)
> 
> where nu_{c,i}^(k,m) is confounded noise.  Compare Models (***) and
> (*).  If d_c = 0, then (*) and (***) are similar, and you can use (*)
> for your data.  If d_c != 0, then d_c*b_c^(k) must be estimated too.
> 
> The Y <- calibrateMultiscan(X) in aroma.light applies to Model (*).
> There is no implementation for Model (***) when d_c != 0, but I would
> say give it a try.
> 
> If you want to, I can have a look at your multiscan data for a typical
> array.  If so, we'll have to figure out a way to transfer three GPR
> files.
> 
> Best
> 
> Henrik
> 
> >
> > many thanks!
> > John
> >
> > --
> > John Fowler                             Associate Professor
> > Botany and Plant Pathology (BPP) Dept.
> > 2082 Cordley Hall                        Phone: (541) 737-5307
> > Oregon State University                  FAX: (541) 737-3573
> > Corvallis, OR  97331-2902  USA           Email: fowlerj at ...
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at ...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

Hi Henrik,

thank you very much for the rapid reply!

My three scans are something like this, I don't have the exact numbers right now:

'low' scan - 80% laser power, PMT at ~350
'medium' scan - 80% laser power, PMT at ~400
'high' scan - 90% laser power, PMT at ~400

In retrospect, I am quieting cursing at myself for changing two variables...

Anyway, after noting your post, I went back and checked the papers by Lyng et al
04 and Piepho et al 06 that I had seem previously, and saw that in both cases
they also kept the laser power constant and changed the PMT.

I actually scanned some of these slides just this morning, and so have the
opportunity to go back and re-scan them - sounds like this might be the best
approach?  Unfortunately, there are some older slides in this experiment for
which this is not an option.

Also, I must admit that I don't follow the details of the statistical solutions
you explained.  However, I think I grasp the gist of it.  If I try using
calibrateMultiscan(X) with my data, how would I know that it was giving me an
invalid output?

Have you looked at either of the papers I referenced above, to see what you
think of them, and whether the approaches used in those papers would work better
for my situation?

Thank you for your responses, if it seems like it would be worthwhile for me to
get you my .gpr files, and you can take the time to look at them, I think I
should be able to figure out a way to post them someplace where you could
download them.

again, thanks again for your help!
John