[BioC] Removing probes from AffyBatch

lgautier at altern.org lgautier at altern.org
Sat Oct 4 11:11:22 CEST 2008


> Hi Nathan,
>
> No, I never did get around to making a package for the remove
> probes/probe sets functions, mostly because I don't know how!
> I just used it again myself, and had to update the code slightly. The code
> below works with R 2.7.2.  As for how many probes you can remove,
> there probably is no set answer.

I remember a paper where it was shown possible to lower significantly the
number of probes in a probe set (see Antipova et al. 2002).

http://genomebiology.com/2002/3/12/research/0073

> There may be an issue with using
> different numbers of probes per probe set - I seem to recall some
> discussion on this in regards to using MBNI's custom re-mapped cdf
> files for Affy's arrays??

It cannot be excluded that there might a "probes -> probe set summary"
algorithm defeated by different number of probes, but I do not think of
any at the moment. Someone on the list will correct this statement if
necessary.



L.




> Cheers,
> Jenny
>
>
> ### The first part is just creating two ojects (ResetEnvir and
> RemoveProbes) originally
> ### written by Ariel Chernomoretz and modified by Jenny Drnevich to
> remove individual
> ### probes and/or entire probesets. Just highlight everything from here
> until
> ### you see STOP and paste it to R all at once
>
> ResetEnvir<-function(cleancdf){
>   cdfpackagename   <- paste(cleancdf,"cdf",sep="")
>   probepackagename <- paste(cleancdf,"probe",sep="")
>   ll<-search()
>   cdfpackagepos <- grep(cdfpackagename,ll)
>   if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
>   ll<-search()
>   probepackagepos <- grep(probepackagename,ll)
>   if(length(probepackagepos)>0) detach(pos=probepackagepos)
>   require(cdfpackagename,character.only=T)
>   require(probepackagename,character.only=T)
>   require(affy)
> }
>
> RemoveProbes<-function(listOutProbes=NULL,
>                         listOutProbeSets=NULL,
>
> cleancdf,destructive=TRUE){
>
>
>   #default probe dataset values
>   cdfpackagename   <- paste(cleancdf,"cdf",sep="")
>   probepackagename <- paste(cleancdf,"probe",sep="")
>   require(cdfpackagename,character.only = TRUE)
>   require(probepackagename,character.only = TRUE)
>   probe.env.orig <- get(probepackagename)
>
>
>   if(!is.null(listOutProbes)){
>    # taking probes out from CDF env
>    probes<- unlist(lapply(listOutProbes,function(x){
>                             a<-strsplit(x,"at")
>                             aux1<-paste(a[[1]][1],"at",sep="")
>                             aux2<-as.integer(a[[1]][2])
>                             c(aux1,aux2)
>                            }))
>    n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
>    n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
>    probes<-data.frame(I(n1),n2)
>    probes[,1]<-as.character(probes[,1])
>    probes[,2]<-as.integer(probes[,2])
>    pset<-unique(probes[,1])
>    for(i in seq(along=pset)){
>     ii  <-grep(pset[i],probes[,1])
>     iout<-probes[ii,2]
>     a<-get(pset[i],env=get(cdfpackagename))
>     a<-a[-iout,]
>     assign(pset[i],a,env=get(cdfpackagename))
>    }
>   }
>
>
>   # taking probesets out from CDF env
>   if(!is.null(listOutProbeSets)){
>    rm(list=listOutProbeSets,envir=get(cdfpackagename))
>   }
>
>
>   # setting the PROBE env accordingly (idea from gcrma
> compute.affinities.R)
>   tmp <- get("xy2indices",paste("package:",cdfpackagename,sep=""))
>   newAB   <- new("AffyBatch",cdfName=cleancdf)
>   pmIndex <-  unlist(indexProbes(newAB,"pm"))
>   subIndex<-
> match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIndex)
>   rm(newAB)
>   iNA     <- which(is.na(subIndex))
>
>
>   if(length(iNA)>0){
>    ipos<-grep(probepackagename,search())
>    assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
>   }
> }
>
> ### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE
> THE TWO OBJECTS
> ### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()
>
> # All you need now is your affybatch object, and a character vector
> of probe set names
> # and/or another vector of individual probes that you want to remove.
> If your affybatch
> # object is called 'rawdata' and the vector of probesets is
> 'maskedprobes', all
> # you need to do is:
>
> cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)
>
> # Make sure you are starting with the original cdf with all the
> probes and probesets.
>
> ResetEnvir(cleancdf)
>
> # Double-check to make sure all probesets are present in your
> affybatch by typing in
> # the name of your affybatch and looking at the output.
>
> rawdata
>
> # To remove some probe sets (but not individual probes in this example),
> use:
> RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf)
>
> # The cdf file will be temporarily modified to mask the indicated
> probesets & probes,
> # which you can check by typing in the name of your affybatch again
> and seeing that
> # the number of probesets have decreased. The masking can be undone
> by using ResetEnvir
> # as above, or by quitting the session. However, any Expression Set
> objects created
> # when the cdf is modified will have the masked probesets removed
> permanently because
> # they do not refer to the cdf like an affybatch object does.
>
>
>
> At 04:59 AM 9/24/2008, Nathan Harmston wrote:
>>HI everyone,
>>
>>I m trying to remove individual probes from a AffyBatch and have found
>>a previous post:
>>
>>https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>>
>>I was wondering if this ever got put into a package?
>>
>>And also how many probes can be removed from a probeset before it
>>becomes unreliable? I am going to try to use BioStrings to remove
>>probes based on their sequences and other criteria.
>>
>>Many thanks in advance,
>>
>>Nathan
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list