[BioC] biomaRt manual
James W. MacDonald
jmacdon at med.umich.edu
Thu Mar 29 14:02:00 CEST 2007
Hi Weiwei,
Weiwei Shi wrote:
> Here is another question:
>
>> length(unique(ids2))
>
> [1] 12558
>
>> length(ids2)
>
> [1] 12558
>
>> head(ids2)
>
> [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at"
> [6] "31312_at"
>
>> t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"),
>> filters="affy_hg_u95a", values=(ids2), mart=human)
>> dim(t1)
>
> [1] 26360 2
>
>> t1[1:20,]
>
> affy_hg_u95a entrezgene
> 1 32864_at 6736
> 2 32864_at 6736
> 3 41214_at 6192
> 4 41214_at 6192
> 5 31534_at 7544
> 6 31534_at 7544
> 7 36367_at 83259
> 8 36367_at 83259
> 9 36367_at 83259
> 10 36367_at 83259
> 11 1199_at NA
> 12 35929_s_at 64591
> 13 35929_s_at 64591
> 14 35929_s_at NA
>
> Please look at line 12-14.
> Why are there so many duplications? Why is there some inconsistency
> between line12-14?
Again, Steffen Durinck would know better why there are duplicates. I
think he told me once but my memory doesn't work like it used to ;-D
Anyway, if you use output = "list", you will get a list with unique ids:
getBM(attributes=c("affy_hg_u95a",
"entrezgene"),filters="affy_hg_u95a", values=(ids), mart=mart,
output="list")
$affy_hg_u95a
$affy_hg_u95a$`31307_at`
[1] NA
$affy_hg_u95a$`31308_at`
[1] "31308_at"
$affy_hg_u95a$`31309_r_at`
[1] NA
$affy_hg_u95a$`31310_at`
[1] "31310_at"
$affy_hg_u95a$`31311_at`
[1] NA
$affy_hg_u95a$`31312_at`
[1] "31312_at"
$entrezgene
$entrezgene$`31307_at`
[1] NA
$entrezgene$`31308_at`
[1] NA
$entrezgene$`31309_r_at`
[1] NA
$entrezgene$`31310_at`
[1] 2741
$entrezgene$`31311_at`
[1] NA
$entrezgene$`31312_at`
[1] 9312
Best,
Jim
>
> Thanks for the previous prompt replies from every "hardworking"
> people. I am now at China and it should be about 6am at US.
>
> Cheers,
>
> Weiwei
>
>
>
> On 3/29/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>> On Thursday 29 March 2007 07:28, James W. MacDonald wrote:
>> > Hi Weiwei,
>> >
>> > Weiwei Shi wrote:
>> > > Sorry :) when I am composing the following email, I did not realize
>> > > there are a couple of replies now. I read the manual carefully but I
>> > > am still having some questions like this:
>> > >
>> > > For example,
>> > >
>> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"),
>> filters="affy_hg_u95a",
>> > >> values=head(ids2), mart=human)
>> > >
>> > > affy_hg_u95a entrezgene
>> > > 1 31308_at NA
>> > > 2 31310_at 2741
>> > > 3 31312_at 9312
>> > >
>> > >>head(ids2)
>> > >
>> > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at"
>> > > [6] "31312_at"
>> > >
>> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"),
>> filters="affy_hg_u95a",
>> > >> values="31307_at", mart=human)
>> > >
>> > > NULL
>> > >
>> > > I am confused by "NULL" and "NA". I am wondering about the
>> difference b/w
>> > > them.
>> >
>> > Steffen Durinck will know better, but I believe NULL means that Ensembl
>> > doesn't think that probeset maps to anything (e.g., there is nothing
>> > available), and NA means that there is no Entrez Gene ID for that
>> probeset.
>> >
>> > For instance, if you pull the Entrez Gene ID for 31307_at from the
>> > hgu95aENTREZID environment, it lists 9594, but if you search Entrez
>> Gene
>> > for that ID it says it has been discontinued.
>> >
>> > > Another question is how to make >8000 queries faster though I read
>> > > some from previous posts.
>>
>> Make sure that you really need to make 8000 queries. It is much
>> faster to
>> make one or a few large queries than to make many small ones.
>>
>> Sean
>>
>
>
--
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
More information about the Bioconductor
mailing list