[BioC] AnnBuilder results question.
Robert Gentleman
rgentlem at fhcrc.org
Fri Oct 21 07:22:44 CEST 2005
Hi,
Johan Lindberg wrote:
> Hi all. I have a question about something that puzzles me.
>
> I have a set of genes in an index vector that I am interested in.
> ######################################
>
>>de.idx.Avs
>
> RG89-F4 RG249-E4 RG317-E7 RG18-B1 RG130-F1 RG88-E1 RG301-B5
> 856 2528 2666 3413 3638 4226 6687
> RG279-A2 RG121-A5 RG145-A8 RG205-A2 RG18-F2 RG90-B11 RG170-F8
> 7313 7679 7729 7845 8822 8971 9130
> RG248-E11 RG18-A2 RG42-A2 RG94-A5 RG306-A5 RG299-F12 RG89-F3
> 9960 10173 10221 10327 10751 11416 11670
> RG289-B9 RG7-A12 RG31-A3 RG243-E12 RG265-E3 RG305-A6 RG64-B9
> 12073 12183 12225 12656 13374 13455 13645
> RG200-F3 RG95-C1 RG95-G7 RG211-C10 RG283-G4 RG88-D10 RG122-D1
> 13914 17761 17766 17999 18140 19103 19845
> RG202-H10 RG206-D7 RG124-C10 RG252-C10 RG18-C1 RG22-G1 RG22-G4
> 20012 20017 20527 20783 20989 20998 21000
> RG238-G10 RG147-H5 RG283-H2 RG185-H2 RG95-C2 RG215-C11 RG231-G2
> 21436 21924 22194 22678 23169 23415 23442
> RG97-C2 RG145-C11 RG16-D8 RG184-D2 RG202-H11 RG120-C8 RG220-C8
> 23853 23955 24365 24697 25420 25925 26125
> RG261-D3 RG305-H3 RG7-C12 RG95-G3 RG95-G6 RG151-C3 RG231-G3
> 28237 28326 28407 28578 28580 28689 28850
> RG293-C9 RG18-D3 RG230-H9 RG18-C3 RG170-G6
> 29657 30453 30882 31805 32112
> ######################################
>
> The names in the vector are the unique identifiers on the chip and the
> number is the location on the chip.
>
> If I use my home-brewed package to this chip and retrieve geneIDs and
> accessionnumbers I use:
>
this was built with AnnBuilder?
> ######################################
> Vec.Acc <- unlist(mget(names(de.idx.Avs),Hum30kbatch1to5ACCNUM))
> Vec.GeneN <- unlist(mget(names(de.idx.Avs),Hum30kbatch1to5GENENAME))
> ######################################
>
> But if I look at the length of those vectors:
> ######################################
>
>>length(Vec.Acc)
>
> [1] 68
>
>>length(Vec.GeneN)
>
> [1] 69
> ######################################
> They are not of the same length. I think this depends on some error in the
> annotation, because if I just look at the 4:th item in the names(de.idx.Avs)
> vector
> Its:
> ######################################
>
>>names(de.idx.Avs)[4]
>
> [1] "RG18-B1"
>
>>names(unlist(mget(names(de.idx.Avs[4]),Hum30kbatch1to5ACCNUM)))
>
> [1] "RG18-B1"
>
>>names(unlist(mget(names(de.idx.Avs[4]),Hum30kbatch1to5GENENAME)))
>
> [1] "RG18-B11" "RG18-B12"
> ######################################
>
> Then two items are returned from the Hum30kbatch1to5GENENAME environment but
> only one from the Hum30kbatch1to5ACCNUM environment. I would guess it had
> something to do with the mget functions discriminatory ability between
> "RG18-B1" and "RG18-B12" or "RG18-B11" but since it works for
> Hum30kbatch1to5ACCNUM I do not know.
I am not sure what the issue is. Some mappings are one to many, it
appears as if this (GENENAME) is one such case (there are lots of
others). So you must deal with this. Your annotation package says that
the id, RG18-B1 (I think that is the fourth entry in your vector) is
mapped to two gene names, but only one ACCNUM. It has nothing to do with
mget or any other function, that I can see.
>
> What also puzzles me is that
>
> ######################################
>
>>unlist(mget("RG18-B1",Hum30kbatch1to5GENENAME))
>
> RG18-B11
>
> "myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila)"
>
> RG18-B12
>
> " translocated to, 10"
>
So this says, that in your GENENAME environment (or hash table) the
symbol RG18-B1 is mapped to two different names. Do you know if the
mappings (RG18-B11 and RG18-B12) are right? And then if so, you need to
decide which of the two sets of names are right these ones, or the ones
below (or is it the name that is right, and the symbol wrong).
> ######################################
>
>>unlist(mget("RG18-B11",Hum30kbatch1to5GENENAME))
>
> RG18-B11
> "multiple coagulation factor deficiency 2"
this is really an odd way to go about it,
use
Hum30kbatch1to5GENENAME$"RG18-B11"
it is slightly easier to follow, and mget is "multi-get", and intended
for use when you want more than one thing, same goes for the example below.
> ######################################
>
>>unlist(mget("RG18-B12",Hum30kbatch1to5GENENAME))
>
> RG18-B12
> "transportin 1"
> ######################################
>
> Different GENENAME:s are returned for "RG18-B11" depending if I use
> "RG18-B1" or "RG18-B11".
Do you know which one is correct? It does seem that something is
confused (but since you built them yourself and we don't have them it
will be kind of hard to debug). There are lots of possible places where
problems can have arisen, but it would be nice to know where.
Unfortunately much of the work needed ends up on you.
Best wishes,
Robert
>
> Any advice someone?
>
> Best regards
>
> // Johan L
>
>
>
>
>
>>sessionInfo()
>
> R version 2.1.1, 2005-06-20, i386-pc-mingw32
>
> attached base packages:
> [1] "splines" "tools" "methods" "stats" "graphics"
> [6] "grDevices" "utils" "datasets" "base"
>
> other attached packages:
> marray hgu95av2 GOstats multtest
> "1.6.3" "1.8.4" "1.1.1" "1.7.3"
> genefilter survival xtable RBGL
> "1.6.3" "2.18" "1.2-5" "1.3.13"
> graph Ruuid cluster Hum30kbatch1to5
> "1.5.9" "1.5.3" "1.10.0" "1.1.0"
> hgu133plus2 annaffy KEGG GO
> "1.7.0" "1.0.18" "1.8.1" "1.8.2"
> gcrma matchprobes affy maanova
> "1.1.4" "1.0.22" "1.6.7" "0.98-3"
> kth aroma R.io R.graphics
> "0.4.5" "0.85" "0.62" "0.62"
> R.colors R.basic R.utils R.oo
> "0.4" "0.62" "0.62" "0.62"
> limma reposTools annotate Biobase
> "2.0.3" "1.5.19" "1.5.16" "1.5.12"
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list