[BioC] biomaRt manual
Weiwei Shi
helprhelp at gmail.com
Thu Mar 29 13:45:25 CEST 2007
Here is another question:
> length(unique(ids2))
[1] 12558
> length(ids2)
[1] 12558
> head(ids2)
[1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at"
[6] "31312_at"
> t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values=(ids2), mart=human)
> dim(t1)
[1] 26360 2
> t1[1:20,]
affy_hg_u95a entrezgene
1 32864_at 6736
2 32864_at 6736
3 41214_at 6192
4 41214_at 6192
5 31534_at 7544
6 31534_at 7544
7 36367_at 83259
8 36367_at 83259
9 36367_at 83259
10 36367_at 83259
11 1199_at NA
12 35929_s_at 64591
13 35929_s_at 64591
14 35929_s_at NA
Please look at line 12-14.
Why are there so many duplications? Why is there some inconsistency
between line12-14?
Thanks for the previous prompt replies from every "hardworking"
people. I am now at China and it should be about 6am at US.
Cheers,
Weiwei
On 3/29/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Thursday 29 March 2007 07:28, James W. MacDonald wrote:
> > Hi Weiwei,
> >
> > Weiwei Shi wrote:
> > > Sorry :) when I am composing the following email, I did not realize
> > > there are a couple of replies now. I read the manual carefully but I
> > > am still having some questions like this:
> > >
> > > For example,
> > >
> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a",
> > >> values=head(ids2), mart=human)
> > >
> > > affy_hg_u95a entrezgene
> > > 1 31308_at NA
> > > 2 31310_at 2741
> > > 3 31312_at 9312
> > >
> > >>head(ids2)
> > >
> > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at"
> > > [6] "31312_at"
> > >
> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a",
> > >> values="31307_at", mart=human)
> > >
> > > NULL
> > >
> > > I am confused by "NULL" and "NA". I am wondering about the difference b/w
> > > them.
> >
> > Steffen Durinck will know better, but I believe NULL means that Ensembl
> > doesn't think that probeset maps to anything (e.g., there is nothing
> > available), and NA means that there is no Entrez Gene ID for that probeset.
> >
> > For instance, if you pull the Entrez Gene ID for 31307_at from the
> > hgu95aENTREZID environment, it lists 9594, but if you search Entrez Gene
> > for that ID it says it has been discontinued.
> >
> > > Another question is how to make >8000 queries faster though I read
> > > some from previous posts.
>
> Make sure that you really need to make 8000 queries. It is much faster to
> make one or a few large queries than to make many small ones.
>
> Sean
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
More information about the Bioconductor
mailing list