[BioC] GeneFeatureSet
James W. MacDonald
jmacdon at uw.edu
Mon Jan 7 16:14:44 CET 2013
Hi Shahenda,
On 1/6/2013 1:50 AM, Shahenda El-Naggar wrote:
> Hi
> I am trying to combine data that was generated from different microarray platforms (MoGene_10_st and mouse4302)
> when trying to filter genes using P/A calls, I ran into problems with MoGene since it does not have MM probes to run functions like mas5call
> instead I used Oligo package. The problem is when I run the function paCalls and then subset into normalized data I end up with control probes. To avoid that I am trying to remove control probes first then apply paCalls function, however, so far I could not find any function that allow me to do that with GeneFeatureSet which is initially generated through reading CEL files by oligo library
> this is my code
Removing the controls is fairly simple. I have a function in the devel
version of affycoretools that you can use. Rather than installing the
devel version (for which you should also install the devel version of
R), you can just copy paste the following into your R session, and then use.
getMainProbes <- function (input) {
if (is(input, "ExpressionSet")) {
pdinfo <- annotation(input)
if (length(grep("^pd", pdinfo)) != 1)
stop(paste("The file", pdinfo, "does not appear to have
been processed using",
"the oligo package.\nIn this case the argument to this
function should",
"be the name of the correct pd.info package (e.g.,
pd.hugene.1.0.st.v1.\n"),
call. = FALSE)
}
else {
if (is.character(input))
pdinfo <- input
else if (!is.character(input))
stop(paste("The input argument for this function should
either be an ExpressionSet",
"that was generated using oligo, or the name of the
pd.info package",
"that corresponds to your data.\n"), call. = FALSE)
}
require(pdinfo, character.only = TRUE)
con <- db(get(pdinfo))
types <- dbGetQuery(con, paste("select distinct meta_fsetid, type
from featureSet inner join core_mps",
"on featureSet.fsetid=core_mps.fsetid;"))
ind <- types$type %in% 1
dbDisconnect(con)
if (is(input, "ExpressionSet"))
return(input[ind, ])
else return(ind)
}
Note that this is intended for use on summarized data. I don't see where
you are summarizing the mogene expression data below, and quite frankly
have no idea what you are trying to accomplish with that code. In
general, I would do something like
eset <- rma(t)
eset <- getMainProbes(eset)
and then eset would contain only probes with a 'main' designation.
Best,
Jim
>
> library(oligo)
> library(pd.mogene.1.0.st.v1)
> geneCELs<- list.celfiles("C:\\Documents and Settings\\All Users\\Desktop\\Mouse data\\GSE37832_RAW\\MENSC", full.names = TRUE)
> t<-read.celfiles(geneCELs)
> probein<- getProbeInfo(t, probeType = "pm", target = "probeset", sortBy = "none")
> p<-paCalls(t, method="DABG")
> indsum<- apply(p [,1:6], 1, function(x) sum(x[1:3]< 0.05)> 2 || sum(x[4:6]< 0.05)> 2)
> pfinal<- p[insum,]
> probesetids<- subset(probein, probein$fid %in% as.character(rownames(pfinal)))
> dataset.1.mensc<-subset(mendsc.eset, rownames(mensc.eset) %in% as.character(probesetids$man_fsetid)) #mensc.est is normalized using expresso function
> write.csv(dataset.1.mensc, "dataset.1.mensc.csv")
> dim(dataset.1.mensc)
>
>
> Shahenda
>
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list