[BioC] Exclude probes that show sd above 0.1between replicatevalues

Thu Mar 1 15:19:32 CET 2007

Hi Jan,

not sure if I am understanding.
I am with you about the variability of replicate spots... as long as  
they can be measured reliably, as you say. The question I guess is  
where do you take these measurements: at the intensity or at the ratio  
level? If you're looking at the variability based on ratios (M  
values), replicate spots with no signal in one channel tend to have  
wildly varying M values (all quite high, in absolute value). Wouldn't  
a filtering based solely on variation at M value level discard those  
spots? For these kind of spots the M value is irrelevant (I mean, how  
much is something divided by *almost* nothing?), we don't really have  
a use for the actual number, except for the fact that it should be  
large.

As you say, I guess that any analysis depends on what you're after,  
but most "general" approaches I see mentioned don't seem to care about  
this particular case when signal is missing only in one channel. In  
fact, some people just remove any spot where the signal is not  
detectable in both channels, which for my purposes would be a disaster  
[1]. I have my own approach to deal with this, and I am reasonably  
happy, but I am very curious to see how other people approach this  
issue.

[1] We had a while ago a demo of teh software Acuity at our centre.  
The guy contacted us before asking if we'd have some real data we'd  
like to use in teh demo. He chose some of my data, which I thought was  
great, as I had already analysed it using my usual tools. His demo  
picked up genes I knew to be upregulated... but my "top genes" that  
I've continued to use in my experiments were all missing, as they had  
been left behind in one of the filtering steps, either the low  
intensity filter (applied on *either* channel, or the standard  
deviation filter on log2 ratios,, not sure which ones, probably  
both)... it took me a while to convince him that I really really  
didn't want those spots removed, which surprised me. Is most people  
really throwing away these kind of spots?

Jose

Quoting J.Oosting at lumc.nl:

> Hi Jose,
>
> IMHO you should use the variability of replicate spots whenever
> possible. Limma can handle this nicely and for the analysis of
> differential expression I always leave in the replicate spots, and I let
> limma handle them.
>
> For presentation purposes (i.e. heatmaps) it is usually handy to have
> averaged values per gene, and I think that removing genes that cannot be
> measured reliably is a way of improving the visualizations.
>
> Any data-manipulation is context dependent, and especially the effects
> of removing data points should be considered case by case. If you're
> interested in on/off phenomena you should not remove 'empty' spots.
>
> Regards,
>
> Jan
>
>>
>> If you look at the variation on M values alone (it's a
>> MAList), and throw away those with high variation... that
>> sounds like a reasonable thing to do, except that when you
>> have spots with no signal in only one of the channels, the
>> variation is probably quite high too, and you'd remove them.
>> However, they are probably quite an interesting class of
>> spots to keep (genes that become silenced, or activated,
>> after treatment, not merely down/upregulated).
>>
>> I'm mostly studying experiments when I am interested mostly
>> in these cases of activation/silencing, and not so much in
>> up/downregulation alone. I wonder how people account for
>> these situations...
>>
>> Jose
>>
>
>

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK