[BioC] Removing probes from AffyBatch
lgautier at altern.org
lgautier at altern.org
Sat Oct 4 11:11:22 CEST 2008
> Hi Nathan,
>
> No, I never did get around to making a package for the remove
> probes/probe sets functions, mostly because I don't know how!
> I just used it again myself, and had to update the code slightly. The code
> below works with R 2.7.2. As for how many probes you can remove,
> there probably is no set answer.
I remember a paper where it was shown possible to lower significantly the
number of probes in a probe set (see Antipova et al. 2002).
http://genomebiology.com/2002/3/12/research/0073
> There may be an issue with using
> different numbers of probes per probe set - I seem to recall some
> discussion on this in regards to using MBNI's custom re-mapped cdf
> files for Affy's arrays??
It cannot be excluded that there might a "probes -> probe set summary"
algorithm defeated by different number of probes, but I do not think of
any at the moment. Someone on the list will correct this statement if
necessary.
L.
> Cheers,
> Jenny
>
>
> ### The first part is just creating two ojects (ResetEnvir and
> RemoveProbes) originally
> ### written by Ariel Chernomoretz and modified by Jenny Drnevich to
> remove individual
> ### probes and/or entire probesets. Just highlight everything from here
> until
> ### you see STOP and paste it to R all at once
>
> ResetEnvir<-function(cleancdf){
> cdfpackagename <- paste(cleancdf,"cdf",sep="")
> probepackagename <- paste(cleancdf,"probe",sep="")
> ll<-search()
> cdfpackagepos <- grep(cdfpackagename,ll)
> if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
> ll<-search()
> probepackagepos <- grep(probepackagename,ll)
> if(length(probepackagepos)>0) detach(pos=probepackagepos)
> require(cdfpackagename,character.only=T)
> require(probepackagename,character.only=T)
> require(affy)
> }
>
> RemoveProbes<-function(listOutProbes=NULL,
> listOutProbeSets=NULL,
>
> cleancdf,destructive=TRUE){
>
>
> #default probe dataset values
> cdfpackagename <- paste(cleancdf,"cdf",sep="")
> probepackagename <- paste(cleancdf,"probe",sep="")
> require(cdfpackagename,character.only = TRUE)
> require(probepackagename,character.only = TRUE)
> probe.env.orig <- get(probepackagename)
>
>
> if(!is.null(listOutProbes)){
> # taking probes out from CDF env
> probes<- unlist(lapply(listOutProbes,function(x){
> a<-strsplit(x,"at")
> aux1<-paste(a[[1]][1],"at",sep="")
> aux2<-as.integer(a[[1]][2])
> c(aux1,aux2)
> }))
> n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
> n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
> probes<-data.frame(I(n1),n2)
> probes[,1]<-as.character(probes[,1])
> probes[,2]<-as.integer(probes[,2])
> pset<-unique(probes[,1])
> for(i in seq(along=pset)){
> ii <-grep(pset[i],probes[,1])
> iout<-probes[ii,2]
> a<-get(pset[i],env=get(cdfpackagename))
> a<-a[-iout,]
> assign(pset[i],a,env=get(cdfpackagename))
> }
> }
>
>
> # taking probesets out from CDF env
> if(!is.null(listOutProbeSets)){
> rm(list=listOutProbeSets,envir=get(cdfpackagename))
> }
>
>
> # setting the PROBE env accordingly (idea from gcrma
> compute.affinities.R)
> tmp <- get("xy2indices",paste("package:",cdfpackagename,sep=""))
> newAB <- new("AffyBatch",cdfName=cleancdf)
> pmIndex <- unlist(indexProbes(newAB,"pm"))
> subIndex<-
> match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIndex)
> rm(newAB)
> iNA <- which(is.na(subIndex))
>
>
> if(length(iNA)>0){
> ipos<-grep(probepackagename,search())
> assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
> }
> }
>
> ### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE
> THE TWO OBJECTS
> ### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()
>
> # All you need now is your affybatch object, and a character vector
> of probe set names
> # and/or another vector of individual probes that you want to remove.
> If your affybatch
> # object is called 'rawdata' and the vector of probesets is
> 'maskedprobes', all
> # you need to do is:
>
> cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)
>
> # Make sure you are starting with the original cdf with all the
> probes and probesets.
>
> ResetEnvir(cleancdf)
>
> # Double-check to make sure all probesets are present in your
> affybatch by typing in
> # the name of your affybatch and looking at the output.
>
> rawdata
>
> # To remove some probe sets (but not individual probes in this example),
> use:
> RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf)
>
> # The cdf file will be temporarily modified to mask the indicated
> probesets & probes,
> # which you can check by typing in the name of your affybatch again
> and seeing that
> # the number of probesets have decreased. The masking can be undone
> by using ResetEnvir
> # as above, or by quitting the session. However, any Expression Set
> objects created
> # when the cdf is modified will have the masked probesets removed
> permanently because
> # they do not refer to the cdf like an affybatch object does.
>
>
>
> At 04:59 AM 9/24/2008, Nathan Harmston wrote:
>>HI everyone,
>>
>>I m trying to remove individual probes from a AffyBatch and have found
>>a previous post:
>>
>>https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html
>>
>>I was wondering if this ever got put into a package?
>>
>>And also how many probes can be removed from a probeset before it
>>becomes unreliable? I am going to try to use BioStrings to remove
>>probes based on their sequences and other criteria.
>>
>>Many thanks in advance,
>>
>>Nathan
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list