[BioC] mogene10stprobeset.db error
Vincent Carey
stvjc at channing.harvard.edu
Sun Jul 4 09:17:52 CEST 2010
Others will have to comment on the details of this mapping. You are
finding a "hit" to an eight-digit token of weakly specified provenance
(generated with aroma, no indication of version etc) and asserting
that an "ID is clearly correct", without telling us the resource where
you found the "hit". We can use biomart to follow up a bit
> library(biomaRt)
> ss = useDataset("mmusculus_gene_ensembl", mart=useMart("ensembl"))
> fff = getBM(mart=ss, filters="affy_mogene_1_0_st_v1", values="10471503", attributes=c("ensembl_gene_id",
+ "ensembl_transcript_id", "chromosome_name", "mgi_symbol"))
> fff
ensembl_gene_id ensembl_transcript_id chromosome_name mgi_symbol
1 ENSMUSG00000088569 ENSMUST00000157944 2 NA
2 ENSMUSG00000065226 ENSMUST00000083292 9 NA
3 ENSMUSG00000088929 ENSMUST00000158304 9 NA
4 ENSMUSG00000065282 ENSMUST00000083348 9 NA
suggesting that current ensembl annotation maps "the ID" to
transcripts on chr 2 and chr 9. Perhaps biomaRt will yield more clues
for you.
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-30 r52417)
Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] grid splines stats graphics grDevices datasets tools
[8] utils methods base
other attached packages:
[1] biomaRt_2.5.1 mogene10stprobeset.db_5.0.2
[3] mogene10sttranscriptcluster.db_5.0.1 org.Mm.eg.db_2.4.1 etc...
On Sat, Jul 3, 2010 at 4:02 PM, Maxim <deeepersound at googlemail.com> wrote:
> Hi,
>
> I try to analyze MoGene-1_0-st gene arrays. I used the aroma package
> to do this and came up with an expression matrix, but have no clue,
> how to assign real gene names to the respective "IDs" (column "item
> numbers" after aroma normalization and summarization).
>
> As a workaround I simply tried to load the mogene10stprobeset.db library and did
>
> u<-mget(row.names(x),mogene10stprobesetSYMBOL)
>
> with x being the expression matrix and rownames(x) are the IDs. But
> the majority of IDs are unknown:
>
> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
> value for "10471503" not found
>
> But why? This ID is clearly correct:
> 10471503 chr2:32530629-32530765 chr2 NC_000068.6 + 32530629 32530765 25 --- ENSMUST00000082819
> // ENSEMBL // ncrna:snoRNA chromosome:NCBIM37:2:32530629:32530765:1
> gene:ENSMUSG00000064753 // chr2 // 100 // 100 // 25 // 25 // 0 ///
> ENSMUST00000083292 // ENSEMBL // ncrna:snoRNA
> chromosome:NCBIM37:9:15119289:15119425:1 gene:ENSMUSG00000065226 //
> chr2 // 72 // 100 // 18 // 25 // 0 main
>
> What is my problem, obviously I miss something?
>
> Maxim
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list