[BioC] AnnBuilder results question.

Robert Gentleman rgentlem at fhcrc.org
Fri Oct 21 07:22:44 CEST 2005


Hi,

Johan Lindberg wrote:
> Hi all. I have a question about something that puzzles me. 
>  
> I have a set of genes in an index vector that I am interested in.
> ######################################
> 
>>de.idx.Avs
> 
>   RG89-F4  RG249-E4  RG317-E7   RG18-B1  RG130-F1   RG88-E1  RG301-B5 
>       856      2528      2666      3413      3638      4226      6687 
>  RG279-A2  RG121-A5  RG145-A8  RG205-A2   RG18-F2  RG90-B11  RG170-F8 
>      7313      7679      7729      7845      8822      8971      9130 
> RG248-E11   RG18-A2   RG42-A2   RG94-A5  RG306-A5 RG299-F12   RG89-F3 
>      9960     10173     10221     10327     10751     11416     11670 
>  RG289-B9   RG7-A12   RG31-A3 RG243-E12  RG265-E3  RG305-A6   RG64-B9 
>     12073     12183     12225     12656     13374     13455     13645 
>  RG200-F3   RG95-C1   RG95-G7 RG211-C10  RG283-G4  RG88-D10  RG122-D1 
>     13914     17761     17766     17999     18140     19103     19845 
> RG202-H10  RG206-D7 RG124-C10 RG252-C10   RG18-C1   RG22-G1   RG22-G4 
>     20012     20017     20527     20783     20989     20998     21000 
> RG238-G10  RG147-H5  RG283-H2  RG185-H2   RG95-C2 RG215-C11  RG231-G2 
>     21436     21924     22194     22678     23169     23415     23442 
>   RG97-C2 RG145-C11   RG16-D8  RG184-D2 RG202-H11  RG120-C8  RG220-C8 
>     23853     23955     24365     24697     25420     25925     26125 
>  RG261-D3  RG305-H3   RG7-C12   RG95-G3   RG95-G6  RG151-C3  RG231-G3 
>     28237     28326     28407     28578     28580     28689     28850 
>  RG293-C9   RG18-D3  RG230-H9   RG18-C3  RG170-G6 
>     29657     30453     30882     31805     32112
> ######################################
>  
> The names in the vector are the unique identifiers on the chip and the
> number is the location on the chip.
>  
> If I use my home-brewed package to this chip and retrieve geneIDs and
> accessionnumbers I use:
>  
  this was built with AnnBuilder?

> ######################################
> Vec.Acc <- unlist(mget(names(de.idx.Avs),Hum30kbatch1to5ACCNUM))
> Vec.GeneN <- unlist(mget(names(de.idx.Avs),Hum30kbatch1to5GENENAME))
> ######################################
>  
> But if I look at the length of those vectors:
> ######################################
> 
>>length(Vec.Acc)
> 
> [1] 68
> 
>>length(Vec.GeneN)
> 
> [1] 69
> ######################################
> They are not of the same length. I think this depends on some error in the
> annotation, because if I just look at the 4:th item in the names(de.idx.Avs)
> vector
> Its:
> ######################################
> 
>>names(de.idx.Avs)[4]
> 
> [1] "RG18-B1"
> 
>>names(unlist(mget(names(de.idx.Avs[4]),Hum30kbatch1to5ACCNUM)))
> 
> [1] "RG18-B1"
> 
>>names(unlist(mget(names(de.idx.Avs[4]),Hum30kbatch1to5GENENAME)))
> 
> [1] "RG18-B11" "RG18-B12"
> ######################################
>  
> Then two items are returned from the Hum30kbatch1to5GENENAME environment but
> only one from the Hum30kbatch1to5ACCNUM environment. I would guess it had
> something to do with the mget functions discriminatory ability between
> "RG18-B1" and "RG18-B12" or "RG18-B11" but since it works for
> Hum30kbatch1to5ACCNUM I do not know. 

   I am not sure what the issue is. Some mappings are one to many, it 
appears as if this (GENENAME) is one such case (there are lots of 
others). So you must deal with this. Your annotation package says that
  the id, RG18-B1 (I think that is the fourth entry in your vector) is 
mapped to two gene names, but only one ACCNUM. It has nothing to do with 
mget or any other function, that I can see.

>  
> What also puzzles me is that
>  
> ######################################
> 
>>unlist(mget("RG18-B1",Hum30kbatch1to5GENENAME))
> 
>                                                                     RG18-B11
> 
> "myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila)"
> 
>                                                                     RG18-B12
> 
>                                                       " translocated to, 10"
> 
  So this says, that in your GENENAME environment (or hash table) the 
symbol RG18-B1 is mapped to two different names. Do you know if the 
mappings (RG18-B11 and RG18-B12) are right? And then if so, you need to 
decide which of the two sets of names are right these ones, or the ones 
below (or is it the name that is right, and the symbol wrong).

> ######################################
> 
>>unlist(mget("RG18-B11",Hum30kbatch1to5GENENAME))
> 
>                                   RG18-B11 
> "multiple coagulation factor deficiency 2" 

this is really an odd way to go about it,
  use
   Hum30kbatch1to5GENENAME$"RG18-B11"
  it is slightly easier to follow, and mget is "multi-get", and intended 
for use when you want more than one thing, same goes for the example below.

> ######################################
> 
>>unlist(mget("RG18-B12",Hum30kbatch1to5GENENAME))
> 
>        RG18-B12 
> "transportin 1"
> ######################################
>  
> Different GENENAME:s are returned for "RG18-B11" depending if I use
> "RG18-B1" or "RG18-B11".

   Do you know which one is correct? It does seem that something is 
confused (but since you built them yourself and we don't have them it 
will be kind of hard to debug). There are lots of possible places where 
problems can have arisen, but it would be nice to know where. 
Unfortunately much of the work needed ends up on you.

Best wishes,

   Robert

>  
> Any advice someone?
>  
> Best regards
>  
> // Johan L
>  
>  
>  
>  
> 
>>sessionInfo()
> 
> R version 2.1.1, 2005-06-20, i386-pc-mingw32 
>  
> attached base packages:
> [1] "splines"   "tools"     "methods"   "stats"     "graphics" 
> [6] "grDevices" "utils"     "datasets"  "base"     
>  
> other attached packages:
>          marray        hgu95av2         GOstats        multtest 
>         "1.6.3"         "1.8.4"         "1.1.1"         "1.7.3" 
>      genefilter        survival          xtable            RBGL 
>         "1.6.3"          "2.18"         "1.2-5"        "1.3.13" 
>           graph           Ruuid         cluster Hum30kbatch1to5 
>         "1.5.9"         "1.5.3"        "1.10.0"         "1.1.0" 
>     hgu133plus2         annaffy            KEGG              GO 
>         "1.7.0"        "1.0.18"         "1.8.1"         "1.8.2" 
>           gcrma     matchprobes            affy         maanova 
>         "1.1.4"        "1.0.22"         "1.6.7"        "0.98-3" 
>             kth           aroma            R.io      R.graphics 
>         "0.4.5"          "0.85"          "0.62"          "0.62" 
>        R.colors         R.basic         R.utils            R.oo 
>           "0.4"          "0.62"          "0.62"          "0.62" 
>           limma      reposTools        annotate         Biobase 
>         "2.0.3"        "1.5.19"        "1.5.16"        "1.5.12"
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list