[BioC] biomaRt manual

James W. MacDonald jmacdon at med.umich.edu
Thu Mar 29 14:02:00 CEST 2007


Hi Weiwei,

Weiwei Shi wrote:
> Here is another question:
> 
>> length(unique(ids2))
> 
> [1] 12558
> 
>> length(ids2)
> 
> [1] 12558
> 
>> head(ids2)
> 
> [1] "31307_at"   "31308_at"   "31309_r_at" "31310_at"   "31311_at"
> [6] "31312_at"
> 
>> t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"), 
>> filters="affy_hg_u95a", values=(ids2), mart=human)
>> dim(t1)
> 
> [1] 26360     2
> 
>> t1[1:20,]
> 
>   affy_hg_u95a entrezgene
> 1      32864_at       6736
> 2      32864_at       6736
> 3      41214_at       6192
> 4      41214_at       6192
> 5      31534_at       7544
> 6      31534_at       7544
> 7      36367_at      83259
> 8      36367_at      83259
> 9      36367_at      83259
> 10     36367_at      83259
> 11      1199_at         NA
> 12   35929_s_at      64591
> 13   35929_s_at      64591
> 14   35929_s_at         NA
> 
> Please look at line 12-14.
> Why are there so many duplications? Why is there some inconsistency
> between line12-14?

Again, Steffen Durinck would know better why there are duplicates. I 
think he told me once but my memory doesn't work like it used to ;-D

Anyway, if you use output = "list", you will get a list with unique ids:

  getBM(attributes=c("affy_hg_u95a", 
"entrezgene"),filters="affy_hg_u95a", values=(ids), mart=mart, 
output="list")
$affy_hg_u95a
$affy_hg_u95a$`31307_at`
[1] NA

$affy_hg_u95a$`31308_at`
[1] "31308_at"

$affy_hg_u95a$`31309_r_at`
[1] NA

$affy_hg_u95a$`31310_at`
[1] "31310_at"

$affy_hg_u95a$`31311_at`
[1] NA

$affy_hg_u95a$`31312_at`
[1] "31312_at"


$entrezgene
$entrezgene$`31307_at`
[1] NA

$entrezgene$`31308_at`
[1] NA

$entrezgene$`31309_r_at`
[1] NA

$entrezgene$`31310_at`
[1] 2741

$entrezgene$`31311_at`
[1] NA

$entrezgene$`31312_at`
[1] 9312

Best,

Jim


> 
> Thanks for the previous prompt replies from every "hardworking"
> people. I am now at China and it should be about 6am at US.
> 
> Cheers,
> 
> Weiwei
> 
> 
> 
> On 3/29/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> 
>> On Thursday 29 March 2007 07:28, James W. MacDonald wrote:
>> > Hi Weiwei,
>> >
>> > Weiwei Shi wrote:
>> > > Sorry :) when I am composing the following email, I did not realize
>> > > there are a couple of replies now. I read the manual carefully but I
>> > > am still having some questions like this:
>> > >
>> > > For example,
>> > >
>> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), 
>> filters="affy_hg_u95a",
>> > >> values=head(ids2), mart=human)
>> > >
>> > >   affy_hg_u95a entrezgene
>> > > 1     31308_at         NA
>> > > 2     31310_at       2741
>> > > 3     31312_at       9312
>> > >
>> > >>head(ids2)
>> > >
>> > > [1] "31307_at"   "31308_at"   "31309_r_at" "31310_at"   "31311_at"
>> > > [6] "31312_at"
>> > >
>> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), 
>> filters="affy_hg_u95a",
>> > >> values="31307_at", mart=human)
>> > >
>> > > NULL
>> > >
>> > > I am confused by "NULL" and "NA". I am wondering about the 
>> difference b/w
>> > > them.
>> >
>> > Steffen Durinck will know better, but I believe NULL means that Ensembl
>> > doesn't think that probeset maps to anything (e.g., there is nothing
>> > available), and NA means that there is no Entrez Gene ID for that 
>> probeset.
>> >
>> > For instance, if you pull the Entrez Gene ID for 31307_at from the
>> > hgu95aENTREZID environment, it lists 9594, but if you search Entrez 
>> Gene
>> > for that ID it says it has been discontinued.
>> >
>> > > Another question is how to make >8000 queries faster though I read
>> > > some from previous posts.
>>
>> Make sure that you really need to make 8000 queries.  It is much 
>> faster to
>> make one or a few large queries than to make many small ones.
>>
>> Sean
>>
> 
> 


-- 
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623



**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list