[BioC] nsFilter and GSEA

Fri Jan 11 17:08:18 CET 2008

Hi Paolo,

Have you try plotting the IQR or the variance of the signal intensity
for all probesets to see what the distribution looks like? Normally you
should see two clear groups - one peak with small IQR value over a
narrow range, and the other with peak at greater IQR and more spread
out. Do you really have most of your probesets not varying much across
samples?

It doesn't seem right that so many genes are removed. But I haven't used
the function nsFilter before. 

Alex
--------------------------------------------
Alex C. Lam
Roslin Institute (Edinburgh)
Midlothian
EH25 9PS
United Kingdom
Tel: +44 131 5274471

Roslin Institute is a company limited by guarantee, registered in
Scotland (registered number SC157100) and a Scottish Charity (registered
number SC023592). Our registered office is at Roslin, Midlothian, EH25
9PS. VAT registration number 847380013.

The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of the addressee only.   The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Paolo
Innocenti
Sent: 11 January 2008 15:41
To: ML Bioconductor
Subject: Re: [BioC] nsFilter and GSEA

Hi,

thanks for the prompt reply. I am a bit confused now.

Sean Davis wrote:
> Sounds like you probably need to take a closer look at the data.  I 
> have not used the nsFilter function, but it looks like the default 
> variance function is IQR (interquartile range) and the default cutoff 
> is 0.5 for that function value is 0.5.  If nearly 99% of your probes 
> have an IQR<0.5, I would look at the data quality closely to see if 
> there are data quality issues or preprocessing steps that do not make 
> sense (can't tell what was done before RMA).

Boxplot, hist, RLE, NUSE, MAplot, RNAdeg: they all look fine (except for
one chip that is *a bit* strange, but that should just increase the
variance (?) . Can you suggest me other tests (and how to interpret
them)?
And there is no preprocessing except for RMA (maybe is this the wrong
step?):

miame <- read.MIAME("miame")
phenodata<- read.AnnotatedDataFrame("phenodata",sep=" ") mydata <-
ReadAffy(sampleNames=sampleNames(phenodata),
	phenoData=phenodata,
	description=miame)
eset <- rma(mydata)
eset.f <- nsFilter(eset)$eset

What if the problem is that the data are TOO good? Makes sense to guess
that, if data mirror exactly the biology of the sample, I am expecting
heaps of genes with the exactly the same expression level, and "a few"
genes with differential expression? (the experimental design was
virginVSmated female flies: mating is expected to promote some change in
female physiology, probably affecting more that 60 genes, though).

  Cheers,
Paolo

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor