[BioC] How to read a subset of the .CEL files

Roayaei, Jean (DMS) [Contr] roayaeij at css.ncifcrf.gov
Tue Jun 27 18:30:43 CEST 2006


Dear all,

Henrik's explanation is correct. Similar queries made against NCBI
soybean data sets yield the same number of genes.

Jean Roayaei
DMS, NCI-Frederick

-----Original Message-----
From: henrik.bengtsson at gmail.com [mailto:henrik.bengtsson at gmail.com] On
Behalf Of Henrik Bengtsson
Sent: Monday, June 26, 2006 6:31 AM
To: James W. MacDonald
Cc: Alvord, Greg (DMS) [Contr]; Roayaei, Jean (DMS) [Contr];
bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] How to read a subset of the .CEL files

See the affxparser package, e.g. readCelUnits(filenames,
units=c(1,600:612,45)).  At the moment,you have to take it from there
yourself.

Henrik Bengtsson

On 6/24/06, James W. MacDonald <jmacdon at med.umich.edu> wrote:
> Hi Greg,
>
> Alvord, Greg (DMS) [Contr] wrote:
> >
> >
> > Dear List -
> >
> >
> >
> > I am new to BioConductor and R, working under Windows with a gig of
RAM,
> > version R-2.2.1 of R.  I have successfully read in six .CEL files
and
> > created the following AffyBatch object.
> >
> >
> >
> >
> >>soy.ab
> >
> >
> > AffyBatch object
> >
> > size of arrays=1164x1164 features (63516 kb)
> >
> > cdf=Soybean (61170 affyids)
> >
> > number of samples=6
> >
> > number of genes=61170
> >
> > annotation=soybean
> >
> >
> >
> > The investigator for whom I'm working is interested in an analysis
of
> > differential gene expression on a subset of affyids in this
AffyBatch
> > object, specifically in 37,744 of the 61,170 affyids (indicated
above)
> > that relate specifically to the soybean genome.  I have learned that
the
> > relevant species of interest is labeled 'Glycine max'.  I obtained
this
> > information from another source and have not (due to my ignorance)
been
> > able to identify any slot in soy.ab AffyBatch object that identifies
> > this species.  Here is a table of the species on the soy.ab
AffyBatch
> > object (which I obtained from another source).
> >
> >
> >
> >
> >>cbind(table(Species))
> >
> >
> >                                           [,1]
> >
> > Alfalfa mosaic virus                         3
> >
> > Bean pod mottle virus strain G-7             2
> >
> > Bean pod mottle virus strain K-Hancock1      1
> >
> > Clover yellow vein virus                     1
> >
> > Glycine max                              37744
> >
> > Heterodera glycines                       7539
> >
> > Phytophthora sojae                       15864
> >
> > S. saman                                     4
> >
> > Southern bean mosaic virus strain SBMV-S     1
> >
> > Soybean mosaic virus                         1
> >
> > Soybean mosaic virus strain G5               3
> >
> > Soybean mosaic virus strain G7               1
> >
> > Soybean mosaic virus strain N                1
> >
> > Tobacco ringspot virus                       2
> >
> > Tobacco streak virus                         3
> >
> >
> >
> >
> >
> > I want to select from the soy.ab AffyBatch object the relevant
> > information for the species 'Glycine max' only.  I have created a
data
> > frame containing those Affy.ID's for species 'Glycine max', e.g.,
> >
> >
> >
> >
> >>Glycine.max.Species.AffyID.df[c(1:3,37742:37744),]
> >
> >
> >           Species                Affy.ID
> >
> > 8     Glycine max         AFFX-BioB-3_at
> >
> > 9     Glycine max         AFFX-BioB-5_at
> >
> > 10    Glycine max         AFFX-BioB-M_at
> >
> > 37749 Glycine max soybean_rRNA_838_RC_at
> >
> > 37750 Glycine max    soybean_rRNA_918_at
> >
> > 37751 Glycine max soybean_rRNA_918_RC_at
> >
> >
> >
> >
> >>dim(Glycine.max.Species.AffyID.df)
> >
> >
> > [1] 37744     2
> >
> >
> >
> > How do I extract/create an AffyBatch object containing only the
> > appropriate Affy.ID's related to the 'Glycine max' species?
>
> An AffyBatch object isn't the best for subsetting this way. Better
would
> be to compute expression values using rma() or your favorite method,
and
> then subset.
>
> eset <- rma(soy.ab)
> subsetted.exprset <- eset[Glycine.max.Species.AffyID.df[,2],]
>
> HTH,
>
> Jim
>
> --
> James W. MacDonald
> University of Michigan
> Affymetrix and cDNA Microarray Core
> 1500 E Medical Center Drive
> Ann Arbor MI 48109
> 734-647-5623
>
>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list