[BioC] How to read a subset of the .CEL files

James W. MacDonald jmacdon at med.umich.edu
Sat Jun 24 22:37:14 CEST 2006


Hi Greg,

Alvord, Greg (DMS) [Contr] wrote:
>  
> 
> Dear List - 
> 
>  
> 
> I am new to BioConductor and R, working under Windows with a gig of RAM,
> version R-2.2.1 of R.  I have successfully read in six .CEL files and
> created the following AffyBatch object.
> 
>  
> 
> 
>>soy.ab
> 
> 
> AffyBatch object
> 
> size of arrays=1164x1164 features (63516 kb)
> 
> cdf=Soybean (61170 affyids)
> 
> number of samples=6
> 
> number of genes=61170
> 
> annotation=soybean
> 
>  
> 
> The investigator for whom I'm working is interested in an analysis of
> differential gene expression on a subset of affyids in this AffyBatch
> object, specifically in 37,744 of the 61,170 affyids (indicated above)
> that relate specifically to the soybean genome.  I have learned that the
> relevant species of interest is labeled 'Glycine max'.  I obtained this
> information from another source and have not (due to my ignorance) been
> able to identify any slot in soy.ab AffyBatch object that identifies
> this species.  Here is a table of the species on the soy.ab AffyBatch
> object (which I obtained from another source).     
> 
>  
> 
> 
>>cbind(table(Species)) 
> 
> 
>                                           [,1]
> 
> Alfalfa mosaic virus                         3
> 
> Bean pod mottle virus strain G-7             2
> 
> Bean pod mottle virus strain K-Hancock1      1
> 
> Clover yellow vein virus                     1
> 
> Glycine max                              37744
> 
> Heterodera glycines                       7539
> 
> Phytophthora sojae                       15864
> 
> S. saman                                     4
> 
> Southern bean mosaic virus strain SBMV-S     1
> 
> Soybean mosaic virus                         1
> 
> Soybean mosaic virus strain G5               3
> 
> Soybean mosaic virus strain G7               1
> 
> Soybean mosaic virus strain N                1
> 
> Tobacco ringspot virus                       2
> 
> Tobacco streak virus                         3
> 
>  
> 
>  
> 
> I want to select from the soy.ab AffyBatch object the relevant
> information for the species 'Glycine max' only.  I have created a data
> frame containing those Affy.ID's for species 'Glycine max', e.g.,
> 
>  
> 
> 
>>Glycine.max.Species.AffyID.df[c(1:3,37742:37744),] 
> 
> 
>           Species                Affy.ID
> 
> 8     Glycine max         AFFX-BioB-3_at
> 
> 9     Glycine max         AFFX-BioB-5_at
> 
> 10    Glycine max         AFFX-BioB-M_at
> 
> 37749 Glycine max soybean_rRNA_838_RC_at
> 
> 37750 Glycine max    soybean_rRNA_918_at
> 
> 37751 Glycine max soybean_rRNA_918_RC_at
> 
>  
> 
> 
>>dim(Glycine.max.Species.AffyID.df) 
> 
> 
> [1] 37744     2  
> 
>  
> 
> How do I extract/create an AffyBatch object containing only the
> appropriate Affy.ID's related to the 'Glycine max' species?  

An AffyBatch object isn't the best for subsetting this way. Better would 
be to compute expression values using rma() or your favorite method, and 
then subset.

eset <- rma(soy.ab)
subsetted.exprset <- eset[Glycine.max.Species.AffyID.df[,2],]

HTH,

Jim

-- 
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623



**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list