[BioC] Subsetting Affybatch objects by gene lists.

Mon Mar 15 18:54:00 MET 2004

Hi Stuart
ReadAffy() will automatically read all cel files in your directory, or if
you wish to select specific ones, ReadAffy(widget=T) is useful.

I don't think this is the shortest way, but this should filter your genes:

data(affybatch.example)
PACalls <- mas5calls(affybatch.example)
c<-exprs(PACalls)

countA <- function(x,A = 2){
        length(which(x=="A")) == A  #Count the no of A and check if equal to
A
 }

countA will return a result TRUE, and FAlSE.

To test the rows satisifying TRUE use apply apply(c, 1, countA, A=0) and
then to subselect these use:

c[apply(c, 1, countA, A=0),]
c[apply(c, 1, countA, A=1),]
c[apply(c, 1, countA, A=2),]

This will give you lists of genes for which there are allA, 1 A's, 2 A's
etc.  Equally (which(x=="P")) could be used.  Then filter your data using

A<-ncol(exprs(data))
fil<-c[apply(c, 1, countA, A=A),]
exprs(data)[fil,]

Hope this helps
Aedin

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Horswell,
Stuart
Sent: 15 March 2004 14:22
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Subsetting Affybatch objects by gene lists.

Hi all,

	I'm trying to run an analysis on 24 Affymetrix HGu95v2 chips.

I've set up, via merge.AffyBatch, an affybatch object containing all 24
arrays.

A1 <- read.affybatch("A1.cel")
.
.
.
A24 <- read.affybatch("A24.cel")

A <- merge.AffyBatch(A1, A2)
A <- merge.AffyBatch(A, A3)
.
.
.
A<- merge.AffyBatch(A, A24)

 I then computed MAS5 type Present/Absent calls for each array using
mas5calls.

A.calls <- mas5calls(A)
p.a.A <- exprs(A.calls)

 What I'd like to do now is remove all of those genes without a single
present call across all 24 arrays before normalizing.

I can use the p.a.A file to obtain a list of the gene names/affy id tags
that I want to remove but I can't figure out how to delete the relavent
probe pairs from my affybatch object.

In fact that only things I've been able to find on the mailing list archive
and/or vignettes are how to subset by array or how to remove chunks from the
cdf environment - but this presents me with two problems, first I'm not sure
I can get the pattern matching working well enough to identify which entry
numbers in the cdf file correspond to the gene list I have, and secondly,
people have already commented that this isn't neccessarily a sensible
approach for proper analysis anyway. So I'm kind of stumped now!

Any help or advice would be most greatfully received,

     many thanks,

           Stu

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor