[BioC] Removing probes from AffyBatch
Jenny Drnevich
drnevich at illinois.edu
Wed Sep 24 16:49:22 CEST 2008
Hi Nathan,
No, I never did get around to making a package for the remove
probes/probe sets functions, mostly because I don't know how! I just
used it again myself, and had to update the code slightly. The code
below works with R 2.7.2. As for how many probes you can remove,
there probably is no set answer. There may be an issue with using
different numbers of probes per probe set - I seem to recall some
discussion on this in regards to using MBNI's custom re-mapped cdf
files for Affy's arrays??
Cheers,
Jenny
### The first part is just creating two ojects (ResetEnvir and
RemoveProbes) originally
### written by Ariel Chernomoretz and modified by Jenny Drnevich to
remove individual
### probes and/or entire probesets. Just highlight everything from here until
### you see STOP and paste it to R all at once
ResetEnvir<-function(cleancdf){
cdfpackagename <- paste(cleancdf,"cdf",sep="")
probepackagename <- paste(cleancdf,"probe",sep="")
ll<-search()
cdfpackagepos <- grep(cdfpackagename,ll)
if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
ll<-search()
probepackagepos <- grep(probepackagename,ll)
if(length(probepackagepos)>0) detach(pos=probepackagepos)
require(cdfpackagename,character.only=T)
require(probepackagename,character.only=T)
require(affy)
}
RemoveProbes<-function(listOutProbes=NULL,
listOutProbeSets=NULL,
cleancdf,destructive=TRUE){
#default probe dataset values
cdfpackagename <- paste(cleancdf,"cdf",sep="")
probepackagename <- paste(cleancdf,"probe",sep="")
require(cdfpackagename,character.only = TRUE)
require(probepackagename,character.only = TRUE)
probe.env.orig <- get(probepackagename)
if(!is.null(listOutProbes)){
# taking probes out from CDF env
probes<- unlist(lapply(listOutProbes,function(x){
a<-strsplit(x,"at")
aux1<-paste(a[[1]][1],"at",sep="")
aux2<-as.integer(a[[1]][2])
c(aux1,aux2)
}))
n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
probes<-data.frame(I(n1),n2)
probes[,1]<-as.character(probes[,1])
probes[,2]<-as.integer(probes[,2])
pset<-unique(probes[,1])
for(i in seq(along=pset)){
ii <-grep(pset[i],probes[,1])
iout<-probes[ii,2]
a<-get(pset[i],env=get(cdfpackagename))
a<-a[-iout,]
assign(pset[i],a,env=get(cdfpackagename))
}
}
# taking probesets out from CDF env
if(!is.null(listOutProbeSets)){
rm(list=listOutProbeSets,envir=get(cdfpackagename))
}
# setting the PROBE env accordingly (idea from gcrma compute.affinities.R)
tmp <- get("xy2indices",paste("package:",cdfpackagename,sep=""))
newAB <- new("AffyBatch",cdfName=cleancdf)
pmIndex <- unlist(indexProbes(newAB,"pm"))
subIndex<-
match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIndex)
rm(newAB)
iNA <- which(is.na(subIndex))
if(length(iNA)>0){
ipos<-grep(probepackagename,search())
assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
}
}
### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE
THE TWO OBJECTS
### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()
# All you need now is your affybatch object, and a character vector
of probe set names
# and/or another vector of individual probes that you want to remove.
If your affybatch
# object is called 'rawdata' and the vector of probesets is 'maskedprobes', all
# you need to do is:
cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)
# Make sure you are starting with the original cdf with all the
probes and probesets.
ResetEnvir(cleancdf)
# Double-check to make sure all probesets are present in your
affybatch by typing in
# the name of your affybatch and looking at the output.
rawdata
# To remove some probe sets (but not individual probes in this example), use:
RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf)
# The cdf file will be temporarily modified to mask the indicated
probesets & probes,
# which you can check by typing in the name of your affybatch again
and seeing that
# the number of probesets have decreased. The masking can be undone
by using ResetEnvir
# as above, or by quitting the session. However, any Expression Set
objects created
# when the cdf is modified will have the masked probesets removed
permanently because
# they do not refer to the cdf like an affybatch object does.
At 04:59 AM 9/24/2008, Nathan Harmston wrote:
>HI everyone,
>
>I m trying to remove individual probes from a AffyBatch and have found
>a previous post:
>
>https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>
>I was wondering if this ever got put into a package?
>
>And also how many probes can be removed from a probeset before it
>becomes unreliable? I am going to try to use BioStrings to remove
>probes based on their sequences and other criteria.
>
>Many thanks in advance,
>
>Nathan
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
More information about the Bioconductor
mailing list