[BioC] Defining Weights in marrayNorm.

Gordon Smyth smyth at wehi.edu.au
Tue Aug 5 20:43:16 MEST 2003


Dear Michael,

I think you are not understanding exactly how the weights work. What you 
want to do really is accomplished using weights and cannot be accomplished 
by any subsetting operation. Subsetting operations have to, by their very 
nature, apply the same to every array, and this isn't what you want.

1. Let me say first of all that we generally do not recommend restricting 
normalisation only to "good" spots. The normalisation routines are written 
so that they are robust, i.e., they are able to ignore groups of poor 
quality or differentially expressed genes if they don't follow the trend of 
the rest of the data. This means that a minority of poor quality spots is 
unlikely to do much harm. Very often there is some information even in the 
poorer quality spots and it is best to leave them in. This also saves lots 
of time. There are exceptions of course ...

2. How are you choosing the "good quality" spots? Programs like genepix 
flag spots which they think are of questionable quality. If you are using 
flags provided by the image analysis program, then you can read in the 
weights as you read in the data. For example, if you have genepix data then

RG <- read.maimages(files, source="genepix", wt.fun=wtflags(0))

will give zero weight to any spot flagged by genepix as being questionable. 
When you normalise the data using

MA <- normalizeWithinArrays(RG)

the normalisation regressions will use only those spots which have weights 
greater than zero. This will vary between arrays and is exactly what you 
want to achieve. All the spots will be normalized, whether "good" or "bad" 
quality, but only the "good" spots will have any influence on the 
normalisation functions. The normalisation of the "good" spots will be 
exactly as if the "bad" spots where not there.

3. If you have constructed the spot flags yourself, then you'll have to 
proceed something like this. Suppose you have two arrays in two genepix 
output files. Suppose the flags for the first array are stored in a vector 
called 'flag1' with 1 for good spots and 0 for bad. Suppose the flags for 
the second array are stored in a vector 'flag2'. You will read in the 
intensity data using

RG <- read.maimages(files, source="genepix")

Then you'll have to assemble the flags into a matrix with rows for genes 
and columns for arrays using 'cbind(flag1, flag2)'. Then you put this into 
the weight component:

RG$weights <- cbind( flag1, flag2 )

Now you can use

MA <- normalizeWithinArrays(RG)

and normalisation will use, for each array, only those spots for which the 
flags are equal to 1.

4. If you have somehow constructed the flags externally to R, you will need 
to read them into R. Suppose you have the flags in a tab-delimited text 
file with one row for each gene and columns corresponding to arrays. Then 
you read them in:

w <- as.matrix(read.table("myfile"))
RG$weights <- w

and then proceed as before.

Hope this helps
Gordon

At 06:28 PM 5/08/2003, michael watson (IAH-C) wrote:
>Hi
>
>I think the problem that both Jo and myself are having is that we want to 
>know how to subset data, either in limma or the marray* classes, such that 
>we only use good quality spots in the normalisation process.
>
>The problem is, the spots that are "good quality" differ from array to 
>array, so it's not something we can set in the layout object unless we 
>create a different layout object for each array.  So we started looking at 
>the concept of using "weights", but really, the problem of not being able 
>to subset our data successfully still remains.
>
>So as a more generalised question, how can I use Bioconductor to normalise 
>microarray data based only on a subset of good quality spots, the location 
>of which will differ from array to array?
>
>Thanks
>M
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.edu.au]
>Sent: 05 August 2003 01:26
>To: James MacDonald
>Cc: bioconductor at stat.math.ethz.ch; josef.walker at jenner.ac.uk
>Subject: Re: [BioC] Defining Weights in marrayNorm.
>
>
>Dear James and Jim,
>
>Actually the maNorm function doesn't make use of weights, even though
>weights might be set in the marrayRaw object. If you look at the code for
>maNorm you will see that the weights are set to NULL when the call is main
>to maNormMain.
>
>If you want to use weights for normalization you need either to use the
>lower level function maNormMain (which appears to use weights) or use the
>normalization routines in the limma package instead.
>
>In limma you use read.maimages to read the data into, perhaps picking up
>the quality weights from genepix or quantarray in the process. If you have
>made your own weights, you can simply assign them to the weights component,
>e.g.,
>
>RG <- read.maimages(files, source=your image analysis program)
>RG$weights <- your.weights
>RG$printer <- info about array layout, e.g.,
>list=(ngrid.c=4,ngrid.r=4,nspot.r=20,nspot.c=20)
>MA <- normalizeWithinArrays(RG)
>
>Gordon
>
>At 03:26 AM 5/08/2003, James MacDonald wrote:
> > >From perusing the functions (particularly maNorm), it appears that the
> >weights are used by all normalization procedures except for "median". By
> >definition, a weight is in the range [0,1], so if you use 0 and 1, it
> >will effectively be the same as saying "don't use this" or "use this".
> >You can also use some more moderate values rather than completely
> >eliminating the 'bad' spots (e.g., simply down-weight spots that look
> >sketchy).
> >
> >
> >I think you pass the weights using the additional argument w="maW" in
> >your call to maNorm.
> >
> >HTH,
> >
> >Jim
> >
> >James W. MacDonald
> >Affymetrix and cDNA Microarray Core
> >University of Michigan Cancer Center
> >1500 E. Medical Center Drive
> >7410 CCGC
> >Ann Arbor MI 48109
> >734-647-5623
> >
> > >>> "Josef Walker" <josef.walker at jenner.ac.uk> 08/04/03 12:31PM >>>
> >Hi all,
> >
> >
> >
> >My name is Joe Walker and I am a final year PhD student attempting to
> >use Bioconductor to analyse a large amount of cDNA microarray data
> >from
> >my thesis experiments.
> >
> >
> >
> >For the normalisation stage, there is the option to use weights
> >previously assigned to the genes.
> >
> >I wish to normalise my genes based on a quality controlled subset that
> >changes fro each hybridisation, I think one way to do this is to use
> >the
> >weights option during normalistion.
> >
> >The "slot" for the weights (maW) is assigned/loaded during the
> >marrayInput stage using the read.marrayRaw command (along with name.Gf
> >etc).
> >
> >What I am unclear of is:
> >
> >1)       What form do these weights take i.e does 1 = use this gene
> >and
> >0 = do not use this gene, are they graded, or do they have to be
> >defined
> >elsewhere?
> >
> >2)       Do you use these weights by simply using maW = TRUE, during
> >the
> >normalisation stage?
> >
> >Am I at least on the right track?
> >
> >If anyone has advice for me it would be great.
> >
> >Thanks in advance,
> >
> >Joe
> >
> >Josef Walker BSc (Hons)
> >
> >PhD Student
> >
> >Memory Group
> >
> >The Edward Jenner Institute for Vaccine Research
> >
> >Compton
> >
> >Nr Newbury
> >
> >Berkshire
> >
> >RG20 7NN
> >
> >
> >
> >Tel: 01635 577905
> >
> >Fax: 01635 577901
> >
> >E-mail: Josef.walker at jenner.ac.uk



More information about the Bioconductor mailing list