[BioC] annotating microarray data with mogene10stv1
Jakub Stanislaw Nowak
jakub.nowak at ed.ac.uk
Tue Jul 22 23:10:19 CEST 2014
Hi Xiayu and Jim
Now it is working nicely.
Many thanks guys,
Jakub
On 22 Jul 2014, at 21:04, Rao,Xiayu <XRao at mdanderson.org> wrote:
> Hi, Jakub
>
> When you do ID <- getMainProbes(eset), the ID here is an expression set rather than a character vector. To extract the character vector, you can do featureNames(ID).
>
> select(mogene10sttranscriptcluster.db, featureNames(ID), c("SYMBOL","GENENAME","ENTREZID"))
>
> Best,
> Xiayu
>
>
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Jakub Stanislaw Nowak
> Sent: Tuesday, July 22, 2014 2:42 PM
> To: James W. MacDonald
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] annotating microarray data with mogene10stv1
>
> Hi Jim,
>
> Thanks for your suggestion. Somehow I overlooked the function select. Now I think I am getting closer.
> I have a problem with applying select () to my probes. I think it may be due to type of ID = probes value type which is ExpressionSet.
>
> So first as explained before I generated the ID containing main probes from my dataset
>
> > > ID <- getMainProbes(eset)
> > > ID
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 28858 features, 6 samples
> > element names: exprs
> > protocolData
> > rowNames: mock1 mock2 ... siLin28a2 (6 total)
> > varLabels: exprs dates
> > varMetadata: labelDescription channel
> > phenoData
> > rowNames: mock1 mock2 ... siLin28a2 (6 total)
> > varLabels: index
> > varMetadata: labelDescription channel
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: pd.mogene.1.0.st.v1
>
> Then I wanted to annotate using select() and I am getting this error.
>
> > > tmp <- select(mogene10sttranscriptcluster.db, ID, c("SYMBOL","GENENAME","ENTREZID"))
> > Error in .testForValidKeys(x, keys, keytype) :
> > 'keys' must be a character vector
>
>
> However if I use ID which is generated with featureNames() the select() works but I think I am not removing control probes that you were describing before by applying this approach.
>
> Is there a way that I can convert value which is of type ExpressionSet to a character type? Or alternatively what should I do make it work?
>
> Many thanks,
>
> Jakub
>
> On 22 Jul 2014, at 17:21, James W. MacDonald <jmacdon at uw.edu> wrote:
>
> > Hi Jakub,
> >
> > Please don't take questions off-list (use 'Reply-all' when responding).
> >
> > On 7/22/2014 12:06 PM, Jakub Stanislaw Nowak wrote:
> >> Hi Jim,
> >>
> >> I think I have couple follow up questions. As I got stuck trying using getMainProbes function.
> >> As I am still a beginner with R my question might sound quite naive
> >>
> >> 1. First question is about loading data using oligo package. Which approach would you use or they both give the same output?
> >>
> >>>> celFiles<-list.celfiles()
> >>>> mydata <- read.celfiles(celFiles)
> >>> Platform design info loaded.
> >>> Reading in : GSM910962.CEL
> >>> Reading in : GSM910963.CEL
> >>> Reading in : GSM910964.CEL
> >>> Reading in : GSM910965.CEL
> >>> Reading in : GSM910966.CEL
> >>> Reading in : GSM910967.CEL
> >>
> >> or
> >>
> >>>> adf<-read.AnnotatedDataFrame("target.txt",row.names=1, header=TRUE, as.is=TRUE)
> >>>> mydata2 <- read.celfiles(filenames=pData(adf)$FileName,phenoData=adf)
> >>> Platform design info loaded.
> >>> Reading in : GSM910962.CEL
> >>> Reading in : GSM910963.CEL
> >>> Reading in : GSM910964.CEL
> >>> Reading in : GSM910965.CEL
> >>> Reading in : GSM910966.CEL
> >>> Reading in : GSM910967.CEL
> >>> Warning message:
> >>> In read.celfiles(filenames = pData(adf)$FileName, phenoData = adf) :
> >>> 'channel' automatically added to varMetadata in phenoData.
> >
> > There should be no difference between the two, other than the obvious difference in the phenoData slot.
> >
> >>
> >> 2. how would use function getMainProbes
> >>
> >> I tried this and I ended up getting an error
> >>
> >>>> eset <- rma(mydata)
> >>> Background correcting
> >>> Normalizing
> >>> Calculating Expression
> >>
> >>>> ID <- getMainProbes(eset)
> >>>> ID
> >>> ExpressionSet (storageMode: lockedEnvironment)
> >>> assayData: 28858 features, 6 samples
> >>> element names: exprs
> >>> protocolData
> >>> rowNames: mock1 mock2 ... siLin28a2 (6 total)
> >>> varLabels: exprs dates
> >>> varMetadata: labelDescription channel
> >>> phenoData
> >>> rowNames: mock1 mock2 ... siLin28a2 (6 total)
> >>> varLabels: index
> >>> varMetadata: labelDescription channel
> >>> featureData: none
> >>> experimentData: use 'experimentData(object)'
> >>> Annotation: pd.mogene.1.0.st.v1
> >
> > You didn't get an error. You were returned an ExpressionSet containing only the 28,858 main probes (you started with 35K or so, IIRC).
> >
> >>
> >>>> symbol <- getSYMBOL(ID, "pd.mogene.1.0.st.v1")
> >>> Error in unlist(lookUp(x, data, "SYMBOL")) :
> >>> error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in mget(x, envir = getAnnMap(what, chip = data, load = load), ifnotfound = NA) :
> >>> error in evaluating the argument 'envir' in selecting a method for function 'mget': Error in (function (classes, fdef, mtable) :
> >>> unable to find an inherited method for function ‘columns’ for signature ‘"AffyGenePDInfo”’
> >>
> >> I think getMainProbes vs featureNames result in different format of output so maybe therefore my reasoning is wrong when I want to obtain symbols.
> >> Also what type of annotation would you use. pd.mogene.1.0.st.v1 or mogene10sttranscriptcluster.db?
> >
> > I gave you a suggestion previously that you shouldn't be using getSYMBOL(), or lookUp() or any of the old-style annotation functions. That suggestion still holds! Use select() instead!
> >
> > Also, pd.mogene.1.0.st.v1 isn't an annotation package. It is similar in spirit to the cdf packages that you use with the affy package, and is used to map probes to probesets, among other things.
> >
> > The annotation package for this array, when summarized at the 'core' level (which is the default for oligo::rma()) is the mogene10sttranscriptcluster.db package. Refer to my previous email to see how to use this package to annotate your data.
> >
> > Best,
> >
> > Jim
> >
> >
> >>
> >> I will be grateful if you can give me some suggestions.
> >>
> >> Thanks,
> >>
> >> Jakub
> >>
> >>
> >>
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140722/d9109e0d/attachment.pl>
More information about the Bioconductor
mailing list