[BioC] nsFilter and GSEA
Paolo Innocenti
paolo.innocenti at ebc.uu.se
Fri Jan 11 19:44:09 CET 2008
Dear Robert and BioC Mailing list,
The chips are Affymetrix Drosophila genome 1.0 (annotation drosgenome1).
I am even more confused: to make sure that was not my fault, I copied
the .CEL files in a new directory, started a fresh R session from there
and run *just* the following code. Same results:
> library(affy)
Loading required package: Biobase
Loading required package: tools
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
Loading required package: affyio
Loading required package: preprocessCore
> mydata <- ReadAffy()
> eset.rma <- rma(mydata)
Background correcting
Normalizing
Calculating Expression
> eset.mas <- mas5(mydata)
background correction: mas
PM/MM correction : mas
expression values: mas
background correcting...done.
14010 ids to be processed
| |
|####################|
> library(genefilter)
Loading required package: survival
Loading required package: splines
> eset.rma.f <- nsFilter(eset.rma)
> eset.mas.f <- nsFilter(eset.mas)
> eset.rma.f
$eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 171 features, 15 samples
element names: exprs
phenoData
sampleNames: dta_2a.CEL, dta_2b.CEL, ..., virgin_4b.CEL (15 total)
varLabels and varMetadata description:
sample: arbitrary numbering
featureData
featureNames: 147260_at, 142359_at, ..., 145988_at (171 total)
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: drosgenome1
$filter.log
$filter.log$numDupsRemoved
[1] 3
$filter.log$numLowVar
[1] 13047
$filter.log$feature.exclude
[1] 3
$filter.log$numRemoved.ENTREZID
[1] 786
> eset.mas.f
$eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12122 features, 15 samples
element names: exprs, se.exprs
phenoData
sampleNames: dta_2a.CEL, dta_2b.CEL, ..., virgin_4b.CEL (15 total)
varLabels and varMetadata description:
sample: arbitrary numbering
featureData
featureNames: 153135_at, 154994_at, ..., 152360_at (12122 total)
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: drosgenome1
$filter.log
$filter.log$numDupsRemoved
[1] 1098
$filter.log$numLowVar
[1] 1
$filter.log$feature.exclude
[1] 3
$filter.log$numRemoved.ENTREZID
[1] 786
> sessionInfo()
R version 2.6.1 (2007-11-26)
i486-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] splines tools stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] drosgenome1_2.0.1 genefilter_1.16.0 survival_2.34
[4] drosgenome1cdf_2.0.0 affy_1.16.0 preprocessCore_1.0.0
[7] affyio_1.6.1 Biobase_1.16.1
loaded via a namespace (and not attached):
[1] annotate_1.16.1 AnnotationDbi_1.0.6 DBI_0.2-4
[4] rcompgen_0.1-17 RSQLite_0.6-4
>
Could be the CEL files that are damaged?
Thanks,
best wishes,
Paolo
Robert Gentleman wrote:
> Hi,
> It looks like something fairly odd is going on, and that we are not
> seeing all of the code that is being run.
>
> What chip are you using? What is very odd is that in your first
> example 1098 "duplicate" probes are found, but in the second run only 3.
> Basically this cannot happen (since the probes are the same) and
> suggests that some piece of code has manipulated the names, and at that
> point I think fairly bad things are going to happen. So this would be
> one place to try and fix things.
>
> Second, nsFilter filters by default at the median, so you should retain
> about 0.5 of your probe sets. But since you loose so many (you didn't
> tell us the chip so I can't be sure) but it looks like all of the values
> are corrupt for that example as well.
>
> So, I think that you are looking in the wrong place. Your problem is
> probably earlier on.
>
> best wishes
> Robert
>
>
> Paolo Innocenti wrote:
>> Hi again,
>>
>> I tried with a different normalisation method, and I was pretty
>> surprised by the results:
>>
>> > eset.mas <- mas5(mydata)
>> background correction: mas
>> PM/MM correction : mas
>> expression values: mas
>> background correcting...done.
>> 14010 ids to be processed
>> | |
>> |####################|
>> > eset.mas.f <- nsFilter(eset.mas)
>> > eset.mas.f$filter.log
>> $numDupsRemoved
>> [1] 1098
>>
>> $numLowVar
>> [1] 1
>>
>> $feature.exclude
>> [1] 3
>>
>> $numRemoved.ENTREZID
>> [1] 786
>>
>> > eset.rma <- rma(mydata)
>> Background correcting
>> Normalizing
>> Calculating Expression
>> > eset.rma.f <- nsFilter(eset.rma)
>> > eset.rma.f$filter.log
>> $numDupsRemoved
>> [1] 3
>>
>> $numLowVar
>> [1] 13047
>>
>> $feature.exclude
>> [1] 3
>>
>> $numRemoved.ENTREZID
>> [1] 786
>>
>> > dim(eset.rma.f$eset)
>> Features Samples
>> 171 15
>> > dim(eset.mas.f$eset)
>> Features Samples
>> 12122 15
>>
>> I don't understand how is it possible. Any suggestion about what to
>> do? Should I lower the cutoff for the rma, or that processing method
>> doesn't work for my dataset?
>>
>> Paolo
>> PS: I tried also a really low cutoff, but the situation doesn't
>> change, unless I choose a cutoff=0.1:
>>
>> > eset.filter <- nsFilter(eset,var.cutoff=0.2)
>> > eset.filter$filter.log
>> $numDupsRemoved
>> [1] 69
>>
>> $numLowVar
>> [1] 10560
>>
>> $feature.exclude
>> [1] 3
>>
>> $numRemoved.ENTREZID
>> [1] 786
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
More information about the Bioconductor
mailing list