[BioC] raw probe set expression data and masking questions
Jenny Drnevich
drnevich at uiuc.edu
Fri Sep 8 17:01:44 CEST 2006
Hi Donald,
At 07:42 AM 9/8/2006, Schwartz, Donald wrote:
>Anyone know a simple method to harvest the raw probe set level expression
>data (not individual PM probe or PM-MM probe pair level data), which will
>be summarized by average difference, and then export that into excel with
>the Affy ID or "probe set name" as row headers and columns representing
>samples?
I don't understand what you mean by 'raw probe set level expression'? The
only raw values are the individual PM and MM probe values. There are
various algorithms to combine them into a probe set level expression value,
but by definition these are 'processed values', not 'raw' values.
> Second question, I have a small affy custom array. I want to select
> certain probe sets for analysis prior to any low-level processing (i.e.,
> prior to normalization and background correction, etc.). In other words,
> I want to mask certain probe sets before normalization and background
> correction. Following masking, the probe sets will be ignored for the
> entire analysis. The difficulty is that there is no consistent naming
> feature across all probe sets so the only way to identify probe sets to
> mask or to analyze is with a character vector. In addition, I need the
> flexibility to mask different sets of probe sets and then process
> (normalize, bg.correct, etc) the data various ways. I'll use:
> expresso(affybatch, widget=TRUE) to do this post masking.
>
>Thanks for your suggestions.
>Don
Below is some code to remove either individual probes or entire probesets.
It was originally written by Ariel Chernomoretz and posted to the mailing
list, and I have subsequently modified it slightly. I've tried to give
explanations throughout - perhaps at some point I'll have time to make it
into a small package. There's a chance it might not work with a custom affy
array, as it assumes you have a cdf and a probe package for your chip named
in the usual way: 'chipnamecdf' and 'chipnameprobe'. If you only want to
remove entire probe sets, it might work even if you don't have a probe
package for your custom array.
Good luck,
Jenny
### The first part is just creating two ojects (ResetEnvir and
RemoveProbes) originally
### written by Ariel Chernomoretz and modified by Jenny Drnevich to remove
individual
### probes and/or entire probesets. Just highlight everything from here until
### you see STOP and paste it to R all at once
ResetEnvir<-function(cleancdf){
cdfpackagename <- paste(cleancdf,"cdf",sep="")
probepackagename <- paste(cleancdf,"probe",sep="")
ll<-search()
cdfpackagepos <- grep(cdfpackagename,ll)
if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
ll<-search()
probepackagepos <- grep(probepackagename,ll)
if(length(probepackagepos)>0) detach(pos=probepackagepos)
require(cdfpackagename,character.only=T)
require(probepackagename,character.only=T)
require(affy)
}
RemoveProbes<-function(listOutProbes=NULL,
listOutProbeSets=NULL,
cleancdf,destructive=TRUE){
#default probe dataset values
cdfpackagename <- paste(cleancdf,"cdf",sep="")
probepackagename <- paste(cleancdf,"probe",sep="")
require(cdfpackagename,character.only = TRUE)
require(probepackagename,character.only = TRUE)
probe.env.orig <- get(probepackagename)
if(!is.null(listOutProbes)){
# taking probes out from CDF env
probes<- unlist(lapply(listOutProbes,function(x){
a<-strsplit(x,"at")
aux1<-paste(a[[1]][1],"at",sep="")
aux2<-as.integer(a[[1]][2])
c(aux1,aux2)
}))
n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
probes<-data.frame(I(n1),n2)
probes[,1]<-as.character(probes[,1])
probes[,2]<-as.integer(probes[,2])
pset<-unique(probes[,1])
for(i in seq(along=pset)){
ii <-grep(pset[i],probes[,1])
iout<-probes[ii,2]
a<-get(pset[i],env=get(cdfpackagename))
a<-a[-iout,]
assign(pset[i],a,env=get(cdfpackagename))
}
}
# taking probesets out from CDF env
if(!is.null(listOutProbeSets)){
rm(list=listOutProbeSets,envir=get(cdfpackagename))
}
# setting the PROBE env accordingly (idea from gcrma compute.affinities.R)
tmp <- get("xy2i",paste("package:",cdfpackagename,sep=""))
newAB <- new("AffyBatch",cdfName=cleancdf)
pmIndex <- unlist(indexProbes(newAB,"pm"))
subIndex<- match(tmp(probe.env.orig$x,probe.env.orig$y),pmIndex)
rm(newAB)
iNA <- which(is.na(subIndex))
if(length(iNA)>0){
ipos<-grep(probepackagename,search())
assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
}
}
### STOP HERE!!!! PASTE THE ABOVE INTO R AND CHECK TO SEE YOU HAVE THE TWO
OBJECTS
### (ResetEnvir and RemoveProbes) IN YOUR WORKSPACE WITH ls()
# All you need now is your affybatch object, and a character vector of
probe set names
# and/or another vector of individual probes that you want to remove. If
your affybatch
# object is called 'rawdata' and the vector of probesets is 'maskedprobes',
follow
# these steps:
cleancdf <- cleancdfname(rawdata at cdfName,addcdf=FALSE)
# Make sure you are starting with the original cdf with all the probes and
probesets.
ResetEnvir(cleancdf)
# Double-check to make sure all probesets are present in your affybatch by
typing in
# the name of your affybatch and looking at the output.
rawdata
# To remove some probe sets (but not individual probes in this example), use:
RemoveProbes(listOutProbes=NULL, listOutProbeSets=maskedprobes, cleancdf)
# The cdf file will be temporarily modified to mask the indicated probesets
& probes,
# which you can check by typing in the name of your affybatch again and
seeing that
# the number of probesets have decreased. The masking can be undone by
using ResetEnvir
# as above, or by quitting the session. However, any Expression Set objects
created
# when the cdf is modified will have the masked probesets removed
permanently because
# they do not refer to the cdf like an affybatch object does.
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list