[BioC] Removing probes from AffyBatch

Jenny Drnevich drnevich at illinois.edu
Wed Sep 24 16:49:22 CEST 2008


Hi Nathan,

No, I never did get around to making a package for the remove 
probes/probe sets functions, mostly because I don't know how! I just 
used it again myself, and had to update the code slightly. The code 
below works with R 2.7.2.  As for how many probes you can remove, 
there probably is no set answer. There may be an issue with using 
different numbers of probes per probe set - I seem to recall some 
discussion on this in regards to using MBNI's custom re-mapped cdf 
files for Affy's arrays??

Cheers,
Jenny


### The first part is just creating two ojects (ResetEnvir and 
RemoveProbes) originally
### written by Ariel Chernomoretz and modified by Jenny Drnevich to 
remove individual
### probes and/or entire probesets. Just highlight everything from here until
### you see STOP and paste it to R all at once

ResetEnvir<-function(cleancdf){
  cdfpackagename   <- paste(cleancdf,"cdf",sep="")
  probepackagename <- paste(cleancdf,"probe",sep="")
  ll<-search()
  cdfpackagepos <- grep(cdfpackagename,ll)
  if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
  ll<-search()
  probepackagepos <- grep(probepackagename,ll)
  if(length(probepackagepos)>0) detach(pos=probepackagepos)
  require(cdfpackagename,character.only=T)
  require(probepackagename,character.only=T)
  require(affy)
}

RemoveProbes<-function(listOutProbes=NULL,
                        listOutProbeSets=NULL,

cleancdf,destructive=TRUE){


  #default probe dataset values
  cdfpackagename   <- paste(cleancdf,"cdf",sep="")
  probepackagename <- paste(cleancdf,"probe",sep="")
  require(cdfpackagename,character.only = TRUE)
  require(probepackagename,character.only = TRUE)
  probe.env.orig <- get(probepackagename)


  if(!is.null(listOutProbes)){
   # taking probes out from CDF env
   probes<- unlist(lapply(listOutProbes,function(x){
                            a<-strsplit(x,"at")
                            aux1<-paste(a[[1]][1],"at",sep="")
                            aux2<-as.integer(a[[1]][2])
                            c(aux1,aux2)
                           }))
   n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
   n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
   probes<-data.frame(I(n1),n2)
   probes[,1]<-as.character(probes[,1])
   probes[,2]<-as.integer(probes[,2])
   pset<-unique(probes[,1])
   for(i in seq(along=pset)){
    ii  <-grep(pset[i],probes[,1])
    iout<-probes[ii,2]
    a<-get(pset[i],env=get(cdfpackagename))
    a<-a[-iout,]
    assign(pset[i],a,env=get(cdfpackagename))
   }
  }


  # taking probesets out from CDF env
  if(!is.null(listOutProbeSets)){
   rm(list=listOutProbeSets,envir=get(cdfpackagename))
  }


  # setting the PROBE env accordingly (idea from gcrma compute.affinities.R)
  tmp <- get("xy2indices",paste("package:",cdfpackagename,sep=""))
  newAB   <- new("AffyBatch",cdfName=cleancdf)
  pmIndex <-  unlist(indexProbes(newAB,"pm"))
  subIndex<- 
match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIndex)
  rm(newAB)
  iNA     <- which(is.na(subIndex))


  if(length(iNA)>0){
   ipos<-grep(probepackagename,search())
   assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
  }
}

### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE 
THE TWO OBJECTS
### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()

# All you need now is your affybatch object, and a character vector 
of probe set names
# and/or another vector of individual probes that you want to remove. 
If your affybatch
# object is called 'rawdata' and the vector of probesets is 'maskedprobes', all
# you need to do is:

cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)

# Make sure you are starting with the original cdf with all the 
probes and probesets.

ResetEnvir(cleancdf)

# Double-check to make sure all probesets are present in your 
affybatch by typing in
# the name of your affybatch and looking at the output.

rawdata

# To remove some probe sets (but not individual probes in this example), use:
RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf)

# The cdf file will be temporarily modified to mask the indicated 
probesets & probes,
# which you can check by typing in the name of your affybatch again 
and seeing that
# the number of probesets have decreased. The masking can be undone 
by using ResetEnvir
# as above, or by quitting the session. However, any Expression Set 
objects created
# when the cdf is modified will have the masked probesets removed 
permanently because
# they do not refer to the cdf like an affybatch object does.



At 04:59 AM 9/24/2008, Nathan Harmston wrote:
>HI everyone,
>
>I m trying to remove individual probes from a AffyBatch and have found
>a previous post:
>
>https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>
>I was wondering if this ever got put into a package?
>
>And also how many probes can be removed from a probeset before it
>becomes unreliable? I am going to try to use BioStrings to remove
>probes based on their sequences and other criteria.
>
>Many thanks in advance,
>
>Nathan
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu



More information about the Bioconductor mailing list