[BioC] filtering

J.delasHeras at ed.ac.uk J.delasHeras at ed.ac.uk
Sat Jul 14 15:04:09 CEST 2007


Hi Lev,

thanks for clarifying. It turns out that you're doing what I thought  
you were doing :-) But I still don't understand why you would want to  
remove that probe that only has a strong signal in one treatment. I  
mean, that very probe will probably be one of the hits you're looking  
for in the first place.
You seem concerned that having no signal in teh other treatments is  
somehow worsening your stats for comparisons between the other 3  
treatments, but I wouldn't be concerned about that. By removing probes  
that are only present in one treatment you're removing one class of  
potentially very interesting probes... I wouldn't do that.
You'd be removing a probe that has strong signal in 4, and negligible  
in 1, yet you'd leave something that has the same signal in 4, and  
just above background in 1... but both probes belong to the same class  
and you removed the one showing a more striking difference of the two.
If the biology of your treatments is such that you expect treatment 4  
to be quite different to the others, such that a very large proportion  
of probes will be expressed in 4 but absent in 1&2&3, then the removal  
of such probes may be useful in some way... I don't know, because in  
that case I'd be more concerned about how I normalise all 4 treatments  
together, considering that one of them should have a completely  
different distribution.

Please, don't take my word for it, I don't know everything and I am  
still learning, everyday, as I go along. I find that it's hard to make  
categorical statements about what is best in microarray analysis,  
because what is best depends on the actual experiment, and you need to  
understand the biology behind the data: what the experiments test for,  
what questions you're aiming to answer, and what the actual biological  
system is.
In principle I feel that if I am analysing together X number of  
experiments, making contrasts etc, is because they're somehow related  
and can be compared. If I do that, then as a principle, I don't want  
to remove a probe that only shows expression in one of the treatments  
because it seems to me that it is one class of probes that I really  
want to know about.

I think I can't say much more on this matter, Lev. You have to make  
your own decisions based on what it is that you are after.

best,

Jose




Quoting Lev Soinov <lev_embl1 at yahoo.co.uk>:

> Hi Jose,
>
>   Let's say we have 4 treatments, 3 replicates each. I am interested  
>  in comparing 1vs2, 1vs3 and 1vs4. I assume that instead of making   
> pairwise comparisons it would be better to use something like the   
> following script.
>
>   temp<-normalizeBetweenArrays(log2(signals), method='quantile')
> design <- model.matrix(~0 +factor(c(1,1,1,2,2,2,3,3,3,4,4,4)))
> colnames(design) <- c("T1","T2","T3","T4")
> contrast.matrix <-
>  makeContrasts(T2-T1, T3-T1, T4-T1, levels=design)
> fit <- lmFit(temp, design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
>
>   So, all treatments are included in lmFit to get more power. Let's   
> suppose a probe has "negligible" signal in 1, 2 and 3, but very   
> strong signal in 4. It would mean that for this probe T2-T1 and   
> T3-T1 could produce erroneous results and also would negatively   
> influence T4-T1 as lmFit would estimate parameters gathering   
> information across all 4 treatments.
>
>   Is it reasonable?
>   Thank you,
>   Lev.
>
>
>
>
>   J.delasHeras at ed.ac.uk wrote:
>   Quoting Lev Soinov :
>
>> Hi Jose,
>>
>> Yes, I totally understand the point about losing some probes that
>> are "not expressed" in one treatment but "expressed" in some other
>> treatments. However, let's say we have 4 treatments, comparing 1vs2,
>> 1vs3 and 1vs4 in LIMMA. If I am not mistaken, it is recommended to
>> process all treatments together to get more power. Suppose a probe
>> has "negligible" signal in 1, 2 and 3, but very strong signal in 4.
>> You would obviously keep it, according to you procedure, and you
>> would be absolutely right if you were interested in 1vs4 only.
>> However, in this particular situation 1vs2 and 1vs3 doesn't make
>> much sense and could produce false positive results. Also, if 1, 2
>> and 3 do not contain "true" signals but only some near-background
>> noise, how would it help to estimate the variance for this probe? I
>> may be wrong here, but it seems to me that information from 1, 2 and
>> 3 would just add more error in lmFit, thus obscuring inferences for
>> 1vs4 as well.
>>
>> Thank you,
>> Lev.
>
> Hi Lev,
>
> I must admit I don't quite follow the way you pick comparisons and probes.
> If I want to analyse comparisons between all 4 treatments, and a probe
> is only present in one of them, say number 4, I'd probably keep it. If
> you want to remove it because it's only expressed in one of them,
> that's your call, but I'd keep it: I am looking for differences
> between treatments, and right there there's a clear one between 4 and
> all teh others... why lose it? Just to improve the FDR a little? I
> don't understand that rationale.
> It's not how I would go about things, but it probably depends on the
> actual experiment and what your goal is. If I don't care about things
> that are different between treatment 4 and the other three... then I'd
> just leave treatment 4 out altogether.
>
> I'm sorry if I am not fully understanding you.
>
> Jose
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
>
>
>
> ---------------------------------
>  What kind of emailer are you? Find out today - get a free analysis   
> of your email personality. Take the quiz at the Yahoo! Mail   
> Championship.



-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK



More information about the Bioconductor mailing list