[BioC] Defining Weights in marrayNorm.

Tue Aug 5 13:40:57 MEST 2003

Dear Gordon,

The flags we (Michael Watson and I) use are self-defined and attached as
an extra column, tagged on to the end of the rest of the data in each
individual .gpr/.txt file. 

It would probably also be prudent to explain more clearly our
definitions of "Good" and "Bad". Spots could be considered as good if
they fit into a number of different categories, based on the fact the
data is derived from two separate images, representing the two different
channels. 
Our definition of a GOOD spot, is one that passes all of our QC criteria
and has a signal intensity above the thresholds we set for defining
whether or not a spot is considered "expressed".

A single spot could be considered good if it passed QC elements for both
channels.
A single spot could be good in one channel and bad in the other channel,
or BAD in both, ending up with an overall assignment of BAD.
However, if the signal in one channel is below the thresholds that
define whether or not a spot is considered to be expressed or not, then
the definitions would change i.e. good signal in one channel, below
threshold signal in the other channel (which might be considered BAD
according to some of the QC criteria); overall this spot would be
considered GOOD.

The decision to use only those genes considered GOOD i.e. expressed in
both channels, is based on the fact that only these genes provide good
reliable signal from both of the channels, and that ratios derived from
these spots would also be reliable. Ratios derived from BAD spots are
definitely unreliable. Ratios derived from those spots with only
background signal in both channels (unexpressed genes), or good in one
channel and unexpressed in the other channel, are unreliable as they do
not contain fluorescence intensity data derived from labelled cDNA in
both channels and so could not account for any gross differences in
signal intensity arising from this source.

So just to re-cap, am I correct in thinking that if we define the
slot/column (w), in which our self-constructed "Flags" are contained, in
our marrayRaw objects (read into R using read.marrayRaw or read.GenePix
and taken from the .gpr or .txt files derived from the raw images) and
then use the maNormMain function  in the marrayNorm Library (not
maNorm), setting maW = TRUE, then these weights WILL be used for
calculating the normalised vaules. 

Thankyou for your help so far, this mailing list is a real life-saver. 

Unfortunately I am a away for the next 7 days so won't be able to access
my messages, but will be looking forward to checking them when I get
back.

Best Wishes

Joe

Josef Walker BSc (Hons)
PhD Student
Memory Group
The Edward Jenner Institute for Vaccine Research
Compton
Nr Newbury
Berkshire
RG20 7NN

Tel: 01635 577905
Fax: 01635 577901
E-mail: Josef.walker at jenner.ac.uk

-----Original Message-----
From: Gordon Smyth [mailto:smyth at wehi.edu.au] 
Sent: 05 August 2003 10:43
To: michael watson (IAH-C)
Cc: James MacDonald; bioconductor at stat.math.ethz.ch; Josef Walker
Subject: RE: [BioC] Defining Weights in marrayNorm.

Dear Michael,

I think you are not understanding exactly how the weights work. What you

want to do really is accomplished using weights and cannot be
accomplished 
by any subsetting operation. Subsetting operations have to, by their
very 
nature, apply the same to every array, and this isn't what you want.

1. Let me say first of all that we generally do not recommend
restricting 
normalisation only to "good" spots. The normalisation routines are
written 
so that they are robust, i.e., they are able to ignore groups of poor 
quality or differentially expressed genes if they don't follow the trend
of 
the rest of the data. This means that a minority of poor quality spots
is 
unlikely to do much harm. Very often there is some information even in
the 
poorer quality spots and it is best to leave them in. This also saves
lots 
of time. There are exceptions of course ...

2. How are you choosing the "good quality" spots? Programs like genepix 
flag spots which they think are of questionable quality. If you are
using 
flags provided by the image analysis program, then you can read in the 
weights as you read in the data. For example, if you have genepix data
then

RG <- read.maimages(files, source="genepix", wt.fun=wtflags(0))

will give zero weight to any spot flagged by genepix as being
questionable. 
When you normalise the data using

MA <- normalizeWithinArrays(RG)

the normalisation regressions will use only those spots which have
weights 
greater than zero. This will vary between arrays and is exactly what you

want to achieve. All the spots will be normalized, whether "good" or
"bad" 
quality, but only the "good" spots will have any influence on the 
normalisation functions. The normalisation of the "good" spots will be 
exactly as if the "bad" spots where not there.

3. If you have constructed the spot flags yourself, then you'll have to 
proceed something like this. Suppose you have two arrays in two genepix 
output files. Suppose the flags for the first array are stored in a
vector 
called 'flag1' with 1 for good spots and 0 for bad. Suppose the flags
for 
the second array are stored in a vector 'flag2'. You will read in the 
intensity data using

RG <- read.maimages(files, source="genepix")

Then you'll have to assemble the flags into a matrix with rows for genes

and columns for arrays using 'cbind(flag1, flag2)'. Then you put this
into 
the weight component:

RG$weights <- cbind( flag1, flag2 )

Now you can use

MA <- normalizeWithinArrays(RG)

and normalisation will use, for each array, only those spots for which
the 
flags are equal to 1.

4. If you have somehow constructed the flags externally to R, you will
need 
to read them into R. Suppose you have the flags in a tab-delimited text 
file with one row for each gene and columns corresponding to arrays.
Then 
you read them in:

w <- as.matrix(read.table("myfile"))
RG$weights <- w

and then proceed as before.

Hope this helps
Gordon

At 06:28 PM 5/08/2003, michael watson (IAH-C) wrote:
>Hi
>
>I think the problem that both Jo and myself are having is that we want
to 
>know how to subset data, either in limma or the marray* classes, such
that 
>we only use good quality spots in the normalisation process.
>
>The problem is, the spots that are "good quality" differ from array to 
>array, so it's not something we can set in the layout object unless we 
>create a different layout object for each array.  So we started looking
at 
>the concept of using "weights", but really, the problem of not being
able 
>to subset our data successfully still remains.
>
>So as a more generalised question, how can I use Bioconductor to
normalise 
>microarray data based only on a subset of good quality spots, the
location 
>of which will differ from array to array?
>
>Thanks
>M
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.edu.au]
>Sent: 05 August 2003 01:26
>To: James MacDonald
>Cc: bioconductor at stat.math.ethz.ch; josef.walker at jenner.ac.uk
>Subject: Re: [BioC] Defining Weights in marrayNorm.
>
>
>Dear James and Jim,
>
>Actually the maNorm function doesn't make use of weights, even though
>weights might be set in the marrayRaw object. If you look at the code
for
>maNorm you will see that the weights are set to NULL when the call is
main
>to maNormMain.
>
>If you want to use weights for normalization you need either to use the
>lower level function maNormMain (which appears to use weights) or use
the
>normalization routines in the limma package instead.
>
>In limma you use read.maimages to read the data into, perhaps picking
up
>the quality weights from genepix or quantarray in the process. If you
have
>made your own weights, you can simply assign them to the weights
component,
>e.g.,
>
>RG <- read.maimages(files, source=your image analysis program)
>RG$weights <- your.weights
>RG$printer <- info about array layout, e.g.,
>list=(ngrid.c=4,ngrid.r=4,nspot.r=20,nspot.c=20)
>MA <- normalizeWithinArrays(RG)
>
>Gordon
>
>At 03:26 AM 5/08/2003, James MacDonald wrote:
> > >From perusing the functions (particularly maNorm), it appears that
the
> >weights are used by all normalization procedures except for "median".
By
> >definition, a weight is in the range [0,1], so if you use 0 and 1, it
> >will effectively be the same as saying "don't use this" or "use
this".
> >You can also use some more moderate values rather than completely
> >eliminating the 'bad' spots (e.g., simply down-weight spots that look
> >sketchy).
> >
> >
> >I think you pass the weights using the additional argument w="maW" in
> >your call to maNorm.
> >
> >HTH,
> >
> >Jim
> >
> >James W. MacDonald
> >Affymetrix and cDNA Microarray Core
> >University of Michigan Cancer Center
> >1500 E. Medical Center Drive
> >7410 CCGC
> >Ann Arbor MI 48109
> >734-647-5623
> >
> > >>> "Josef Walker" <josef.walker at jenner.ac.uk> 08/04/03 12:31PM >>>
> >Hi all,
> >
> >
> >
> >My name is Joe Walker and I am a final year PhD student attempting to
> >use Bioconductor to analyse a large amount of cDNA microarray data
> >from
> >my thesis experiments.
> >
> >
> >
> >For the normalisation stage, there is the option to use weights
> >previously assigned to the genes.
> >
> >I wish to normalise my genes based on a quality controlled subset
that
> >changes fro each hybridisation, I think one way to do this is to use
> >the
> >weights option during normalistion.
> >
> >The "slot" for the weights (maW) is assigned/loaded during the
> >marrayInput stage using the read.marrayRaw command (along with
name.Gf
> >etc).
> >
> >What I am unclear of is:
> >
> >1)       What form do these weights take i.e does 1 = use this gene
> >and
> >0 = do not use this gene, are they graded, or do they have to be
> >defined
> >elsewhere?
> >
> >2)       Do you use these weights by simply using maW = TRUE, during
> >the
> >normalisation stage?
> >
> >Am I at least on the right track?
> >
> >If anyone has advice for me it would be great.
> >
> >Thanks in advance,
> >
> >Joe
> >
> >Josef Walker BSc (Hons)
> >
> >PhD Student
> >
> >Memory Group
> >
> >The Edward Jenner Institute for Vaccine Research
> >
> >Compton
> >
> >Nr Newbury
> >
> >Berkshire
> >
> >RG20 7NN
> >
> >
> >
> >Tel: 01635 577905
> >
> >Fax: 01635 577901
> >
> >E-mail: Josef.walker at jenner.ac.uk