[BioC] How to read a subset of the .CEL files
James W. MacDonald
jmacdon at med.umich.edu
Sat Jun 24 22:37:14 CEST 2006
Hi Greg,
Alvord, Greg (DMS) [Contr] wrote:
>
>
> Dear List -
>
>
>
> I am new to BioConductor and R, working under Windows with a gig of RAM,
> version R-2.2.1 of R. I have successfully read in six .CEL files and
> created the following AffyBatch object.
>
>
>
>
>>soy.ab
>
>
> AffyBatch object
>
> size of arrays=1164x1164 features (63516 kb)
>
> cdf=Soybean (61170 affyids)
>
> number of samples=6
>
> number of genes=61170
>
> annotation=soybean
>
>
>
> The investigator for whom I'm working is interested in an analysis of
> differential gene expression on a subset of affyids in this AffyBatch
> object, specifically in 37,744 of the 61,170 affyids (indicated above)
> that relate specifically to the soybean genome. I have learned that the
> relevant species of interest is labeled 'Glycine max'. I obtained this
> information from another source and have not (due to my ignorance) been
> able to identify any slot in soy.ab AffyBatch object that identifies
> this species. Here is a table of the species on the soy.ab AffyBatch
> object (which I obtained from another source).
>
>
>
>
>>cbind(table(Species))
>
>
> [,1]
>
> Alfalfa mosaic virus 3
>
> Bean pod mottle virus strain G-7 2
>
> Bean pod mottle virus strain K-Hancock1 1
>
> Clover yellow vein virus 1
>
> Glycine max 37744
>
> Heterodera glycines 7539
>
> Phytophthora sojae 15864
>
> S. saman 4
>
> Southern bean mosaic virus strain SBMV-S 1
>
> Soybean mosaic virus 1
>
> Soybean mosaic virus strain G5 3
>
> Soybean mosaic virus strain G7 1
>
> Soybean mosaic virus strain N 1
>
> Tobacco ringspot virus 2
>
> Tobacco streak virus 3
>
>
>
>
>
> I want to select from the soy.ab AffyBatch object the relevant
> information for the species 'Glycine max' only. I have created a data
> frame containing those Affy.ID's for species 'Glycine max', e.g.,
>
>
>
>
>>Glycine.max.Species.AffyID.df[c(1:3,37742:37744),]
>
>
> Species Affy.ID
>
> 8 Glycine max AFFX-BioB-3_at
>
> 9 Glycine max AFFX-BioB-5_at
>
> 10 Glycine max AFFX-BioB-M_at
>
> 37749 Glycine max soybean_rRNA_838_RC_at
>
> 37750 Glycine max soybean_rRNA_918_at
>
> 37751 Glycine max soybean_rRNA_918_RC_at
>
>
>
>
>>dim(Glycine.max.Species.AffyID.df)
>
>
> [1] 37744 2
>
>
>
> How do I extract/create an AffyBatch object containing only the
> appropriate Affy.ID's related to the 'Glycine max' species?
An AffyBatch object isn't the best for subsetting this way. Better would
be to compute expression values using rma() or your favorite method, and
then subset.
eset <- rma(soy.ab)
subsetted.exprset <- eset[Glycine.max.Species.AffyID.df[,2],]
HTH,
Jim
--
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
More information about the Bioconductor
mailing list