[BioC] Defining Weights in marrayNorm.

Tue Aug 5 17:24:39 MEST 2003

Hi Gordon

First of all, thanks for all your help on this matter.  I think we're reaching a point where we are capable of taking this forward, though probably using limma now rather than the marray* classes :-)

I only have one question....

>I don't want to get into an argument on this topic, but there is absolutely 
>no reason to filter out low intensity spots before using loess 
>normalisation. Loess normalisation is intensity-based, it is designed to 
>accept the whole range of intensities.

One case in particular I would want to avoid - the case where in one channel BG is above signal, and in another channel it is not.  In this case, we have very useful data (gene is switched on in one channel, and off in another) yet we do not have a reliable ratio - we either have a negative ratio, an infinite ratio (both utterly meaningless) or we set the negative channel intensity to say, 1, and then the ratio is highly skewed.  In all cases the ratio for that spot is very unreliable and would surely alter, even if only in a small way, the Lowess fit - or am I wrong in thinking that?  I read somewhere that Lowess gives less weight to outliers....

Regards
Mick

>Thankyou for your help so far, this mailing list is a real life-saver.
>
>Unfortunately I am a away for the next 7 days so won't be able to access
>my messages, but will be looking forward to checking them when I get
>back.
>
>Best Wishes
>
>
>Joe
>
>
>Josef Walker BSc (Hons)
>PhD Student
>Memory Group
>The Edward Jenner Institute for Vaccine Research
>Compton
>Nr Newbury
>Berkshire
>RG20 7NN
>
>Tel: 01635 577905
>Fax: 01635 577901
>E-mail: Josef.walker at jenner.ac.uk
>
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.edu.au]
>Sent: 05 August 2003 10:43
>To: michael watson (IAH-C)
>Cc: James MacDonald; bioconductor at stat.math.ethz.ch; Josef Walker
>Subject: RE: [BioC] Defining Weights in marrayNorm.
>
>Dear Michael,
>
>I think you are not understanding exactly how the weights work. What you
>
>want to do really is accomplished using weights and cannot be
>accomplished
>by any subsetting operation. Subsetting operations have to, by their
>very
>nature, apply the same to every array, and this isn't what you want.
>
>1. Let me say first of all that we generally do not recommend
>restricting
>normalisation only to "good" spots. The normalisation routines are
>written
>so that they are robust, i.e., they are able to ignore groups of poor
>quality or differentially expressed genes if they don't follow the trend
>of
>the rest of the data. This means that a minority of poor quality spots
>is
>unlikely to do much harm. Very often there is some information even in
>the
>poorer quality spots and it is best to leave them in. This also saves
>lots
>of time. There are exceptions of course ...
>
>2. How are you choosing the "good quality" spots? Programs like genepix
>flag spots which they think are of questionable quality. If you are
>using
>flags provided by the image analysis program, then you can read in the
>weights as you read in the data. For example, if you have genepix data
>then
>
>RG <- read.maimages(files, source="genepix", wt.fun=wtflags(0))
>
>will give zero weight to any spot flagged by genepix as being
>questionable.
>When you normalise the data using
>
>MA <- normalizeWithinArrays(RG)
>
>the normalisation regressions will use only those spots which have
>weights
>greater than zero. This will vary between arrays and is exactly what you
>
>want to achieve. All the spots will be normalized, whether "good" or
>"bad"
>quality, but only the "good" spots will have any influence on the
>normalisation functions. The normalisation of the "good" spots will be
>exactly as if the "bad" spots where not there.
>
>3. If you have constructed the spot flags yourself, then you'll have to
>proceed something like this. Suppose you have two arrays in two genepix
>output files. Suppose the flags for the first array are stored in a
>vector
>called 'flag1' with 1 for good spots and 0 for bad. Suppose the flags
>for
>the second array are stored in a vector 'flag2'. You will read in the
>intensity data using
>
>RG <- read.maimages(files, source="genepix")
>
>Then you'll have to assemble the flags into a matrix with rows for genes
>
>and columns for arrays using 'cbind(flag1, flag2)'. Then you put this
>into
>the weight component:
>
>RG$weights <- cbind( flag1, flag2 )
>
>Now you can use
>
>MA <- normalizeWithinArrays(RG)
>
>and normalisation will use, for each array, only those spots for which
>the
>flags are equal to 1.
>
>4. If you have somehow constructed the flags externally to R, you will
>need
>to read them into R. Suppose you have the flags in a tab-delimited text
>file with one row for each gene and columns corresponding to arrays.
>Then
>you read them in:
>
>w <- as.matrix(read.table("myfile"))
>RG$weights <- w
>
>and then proceed as before.
>
>Hope this helps
>Gordon
>
>At 06:28 PM 5/08/2003, michael watson (IAH-C) wrote:
> >Hi
> >
> >I think the problem that both Jo and myself are having is that we want
>to
> >know how to subset data, either in limma or the marray* classes, such
>that
> >we only use good quality spots in the normalisation process.
> >
> >The problem is, the spots that are "good quality" differ from array to
> >array, so it's not something we can set in the layout object unless we
> >create a different layout object for each array.  So we started looking
>at
> >the concept of using "weights", but really, the problem of not being
>able
> >to subset our data successfully still remains.
> >
> >So as a more generalised question, how can I use Bioconductor to
>normalise
> >microarray data based only on a subset of good quality spots, the
>location
> >of which will differ from array to array?
> >
> >Thanks
> >M
> >
> >-----Original Message-----
> >From: Gordon Smyth [mailto:smyth at wehi.edu.au]
> >Sent: 05 August 2003 01:26
> >To: James MacDonald
> >Cc: bioconductor at stat.math.ethz.ch; josef.walker at jenner.ac.uk
> >Subject: Re: [BioC] Defining Weights in marrayNorm.
> >
> >
> >Dear James and Jim,
> >
> >Actually the maNorm function doesn't make use of weights, even though
> >weights might be set in the marrayRaw object. If you look at the code
>for
> >maNorm you will see that the weights are set to NULL when the call is
>main
> >to maNormMain.
> >
> >If you want to use weights for normalization you need either to use the
> >lower level function maNormMain (which appears to use weights) or use
>the
> >normalization routines in the limma package instead.
> >
> >In limma you use read.maimages to read the data into, perhaps picking
>up
> >the quality weights from genepix or quantarray in the process. If you
>have
> >made your own weights, you can simply assign them to the weights
>component,
> >e.g.,
> >
> >RG <- read.maimages(files, source=your image analysis program)
> >RG$weights <- your.weights
> >RG$printer <- info about array layout, e.g.,
> >list=(ngrid.c=4,ngrid.r=4,nspot.r=20,nspot.c=20)
> >MA <- normalizeWithinArrays(RG)
> >
> >Gordon
> >
> >At 03:26 AM 5/08/2003, James MacDonald wrote:
> > > >From perusing the functions (particularly maNorm), it appears that
>the
> > >weights are used by all normalization procedures except for "median".
>By
> > >definition, a weight is in the range [0,1], so if you use 0 and 1, it
> > >will effectively be the same as saying "don't use this" or "use
>this".
> > >You can also use some more moderate values rather than completely
> > >eliminating the 'bad' spots (e.g., simply down-weight spots that look
> > >sketchy).
> > >
> > >
> > >I think you pass the weights using the additional argument w="maW" in
> > >your call to maNorm.
> > >
> > >HTH,
> > >
> > >Jim
> > >
> > >James W. MacDonald
> > >Affymetrix and cDNA Microarray Core
> > >University of Michigan Cancer Center
> > >1500 E. Medical Center Drive
> > >7410 CCGC
> > >Ann Arbor MI 48109
> > >734-647-5623
> > >
> > > >>> "Josef Walker" <josef.walker at jenner.ac.uk> 08/04/03 12:31PM >>>
> > >Hi all,
> > >
> > >
> > >
> > >My name is Joe Walker and I am a final year PhD student attempting to
> > >use Bioconductor to analyse a large amount of cDNA microarray data
> > >from
> > >my thesis experiments.
> > >
> > >
> > >
> > >For the normalisation stage, there is the option to use weights
> > >previously assigned to the genes.
> > >
> > >I wish to normalise my genes based on a quality controlled subset
>that
> > >changes fro each hybridisation, I think one way to do this is to use
> > >the
> > >weights option during normalistion.
> > >
> > >The "slot" for the weights (maW) is assigned/loaded during the
> > >marrayInput stage using the read.marrayRaw command (along with
>name.Gf
> > >etc).
> > >
> > >What I am unclear of is:
> > >
> > >1)       What form do these weights take i.e does 1 = use this gene
> > >and
> > >0 = do not use this gene, are they graded, or do they have to be
> > >defined
> > >elsewhere?
> > >
> > >2)       Do you use these weights by simply using maW = TRUE, during
> > >the
> > >normalisation stage?
> > >
> > >Am I at least on the right track?
> > >
> > >If anyone has advice for me it would be great.
> > >
> > >Thanks in advance,
> > >
> > >Joe
> > >
> > >Josef Walker BSc (Hons)
> > >
> > >PhD Student
> > >
> > >Memory Group
> > >
> > >The Edward Jenner Institute for Vaccine Research
> > >
> > >Compton
> > >
> > >Nr Newbury
> > >
> > >Berkshire
> > >
> > >RG20 7NN
> > >
> > >
> > >
> > >Tel: 01635 577905
> > >
> > >Fax: 01635 577901
> > >
> > >E-mail: Josef.walker at jenner.ac.uk