[BioC] ExpressionSet subsetting problem
Martin Morgan
mtmorgan at fhcrc.org
Fri Apr 11 18:18:15 CEST 2008
Hi IAIN --
IAIN GALLAGHER <iaingallagher at btopenworld.com> writes:
> Hi Everyone.
>
> I'm having a problem subsetting an ExpressionSet. After reading my
> cel files in and summarizing with MAS5 I assign a new
> AnnotatedDataFrame to describe the data. This is a tab delimited
> text file in the following format:
[snip]
> pheno <- read.AnnotatedDataFrame('covdesc.txt', sep='\t')
> phenoData(mas_data) <- pheno
Probably the problem is here, where your new AnnotatedDataFrame has
samples ordered differently from mas_data. Try
validObject(mas_data). Here's a reproducible example
> data(sample.ExpressionSet)
> obj <- sample.ExpressionSet
> pd <- phenoData(obj)
> newPd <- pd[sample(sampleNames(pd)),]
> phenoData(obj) <- newPd
> validObject(obj)
Error in validObject(obj) :
invalid class "ExpressionSet" object: sampleNames differ between assayData and phenoData
If I were to have newPd, and wanted to make sure the assignment were
correct, I might
> data(sample.ExpressionSet)
> obj <- sample.ExpressionSet
> phenoData(obj) <- newPd[sampleNames(obj),]
> validObject(obj)
The reason for this dangerous behavior traces back to the need to
sometimes create transiently invalid objects in the process of
transforming from one ExpressionSet to another.
Martin
> This seems to go well.
>
> I now create an index to pull out only those subjects with 'Pancreas' under 'Site'.
>
> panc_index <- which(phenoData(mas_data)$Site == 'Pancreas')
>
> This returns a vector of numbers
>
> 1 3 4 15 23 28 29
>
> Now I subset my data with this
>
> kept_data <- mas_data[,panc_index]
>
> This is where I'm running into problems
>
>> head(exprs(panc_pts))
> F100.CEL F105.CEL F106.CEL F45.CEL F57.CEL F97.CEL
> 1007_s_at 1853.75910 2834.19034 1865.65600 869.44930 1307.60507 2006.37103
> 1053_at 811.05343 517.32617 519.08446 490.94832 582.09189 544.34508
> 117_at 78.34070 26.91147 93.21263 129.14469 241.32762 31.05214
> 121_at 419.79056 494.92934 685.06496 478.36533 661.30741 591.22300
> 1255_g_at 84.53744 18.25635 76.71271 44.79287 69.42122 99.33932
> 1294_at 329.38568 447.23030 529.64516 369.30509 487.00975 339.38840
> F99.CEL
> 1007_s_at 1168.56112
> 1053_at 425.16363
> 117_at 18.87988
> 121_at 511.47964
> 1255_g_at 54.36606
> 1294_at 372.36992
>
> looks ok but whilst subjects 1,3 & 4 are pulled out appropriately (F100, F105 and F106 respectively) the next two subjects are not. F45 is sample number 14 not 15 and F57 is sample number 22 not 23. The last two samples (F97 and F99) are pulled out properly.
>
> Could anyone explain why this is? I'd be most grateful.
>
> Thanks
>
> iain
>
>> sessionInfo()
> R version 2.6.2 (2008-02-08)
> i486-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] simpleaffy_2.14.05 gcrma_2.10.0 matchprobes_1.10.0
> [4] genefilter_1.16.0 survival_2.34 hgu133plus2cdf_2.0.0
> [7] affy_1.16.0 preprocessCore_1.0.0 affyio_1.6.1
> [10] Biobase_1.16.2
>
> loaded via a namespace (and not attached):
> [1] annotate_1.16.1 AnnotationDbi_1.0.6 DBI_0.2-4
> [4] rcompgen_0.1-17 RSQLite_0.6-7
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioconductor
mailing list