[BioC] GeneFeatureSet

Mon Jan 7 16:14:44 CET 2013

Hi Shahenda,

On 1/6/2013 1:50 AM, Shahenda El-Naggar wrote:
> Hi
> I am trying to combine data that was generated from different microarray platforms (MoGene_10_st and mouse4302)
> when trying to filter genes using P/A calls, I ran into problems with MoGene since it does not have MM probes to run functions like mas5call
> instead I used Oligo package. The problem is when I run the function paCalls and then subset into normalized data I end up with control probes. To avoid that I am trying to remove control probes first then apply paCalls function, however, so far I could not find any function that allow me to do that with GeneFeatureSet which is initially generated through reading CEL files by oligo library
> this is my code

Removing the controls is fairly simple. I have a function in the devel 
version of affycoretools that you can use. Rather than installing the 
devel version (for which you should also install the devel version of 
R), you can just copy paste the following into your R session, and then use.

  getMainProbes <- function (input) {
     if (is(input, "ExpressionSet")) {
         pdinfo <- annotation(input)
         if (length(grep("^pd", pdinfo)) != 1)
             stop(paste("The file", pdinfo, "does not appear to have 
been processed using",
                 "the oligo package.\nIn this case the argument to this 
function should",
                 "be the name of the correct pd.info package (e.g., 
pd.hugene.1.0.st.v1.\n"),
                 call. = FALSE)
     }
     else {
         if (is.character(input))
             pdinfo <- input
         else if (!is.character(input))
             stop(paste("The input argument for this function should 
either be an ExpressionSet",
                 "that was generated using oligo, or the name of the 
pd.info package",
                 "that corresponds to your data.\n"), call. = FALSE)
     }
     require(pdinfo, character.only = TRUE)
     con <- db(get(pdinfo))
     types <- dbGetQuery(con, paste("select distinct meta_fsetid, type 
from featureSet inner join core_mps",
         "on featureSet.fsetid=core_mps.fsetid;"))
     ind <- types$type %in% 1
     dbDisconnect(con)
     if (is(input, "ExpressionSet"))
         return(input[ind, ])
     else return(ind)
}

Note that this is intended for use on summarized data. I don't see where 
you are summarizing the mogene expression data below, and quite frankly 
have no idea what you are trying to accomplish with that code. In 
general, I would do something like

eset <- rma(t)
eset <- getMainProbes(eset)

and then eset would contain only probes with a 'main' designation.

Best,

Jim

>
> library(oligo)
> library(pd.mogene.1.0.st.v1)
> geneCELs<- list.celfiles("C:\\Documents and Settings\\All Users\\Desktop\\Mouse data\\GSE37832_RAW\\MENSC", full.names = TRUE)
> t<-read.celfiles(geneCELs)
> probein<- getProbeInfo(t,  probeType = "pm", target = "probeset", sortBy = "none")
> p<-paCalls(t, method="DABG")
> indsum<- apply(p [,1:6], 1, function(x) sum(x[1:3]<  0.05)>  2 || sum(x[4:6]<  0.05)>  2)
> pfinal<- p[insum,]
> probesetids<- subset(probein, probein$fid %in% as.character(rownames(pfinal)))
> dataset.1.mensc<-subset(mendsc.eset, rownames(mensc.eset) %in% as.character(probesetids$man_fsetid))  #mensc.est is normalized using expresso function
> write.csv(dataset.1.mensc, "dataset.1.mensc.csv")
> dim(dataset.1.mensc)
>
>
> Shahenda
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099