[BioC] revmap question
Robert Gentleman
rgentlem at fhcrc.org
Thu Oct 9 18:04:59 CEST 2008
lgautier at altern.org wrote:
>> James W. MacDonald wrote:
>>> Hi Raffaele,
>>>
>>> rcaloger wrote:
>>>> Hi,
>>>> I found very interesting the possibility of using reversing the
>>>> mapping by revmap in the XXXX.db annotation databases.
>>>>
>>>> However, I have two problems:
>>>> 1) if I use:
>>>> egs <- c("1", "100", "1000")
>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>
>>>> I am getting not only the probesets associated to the three EGs:
>>>> 1 1001 1002 1003 10001
>>>> "229819_at" "1556117_at" "204639_at" "216705_s_at" "203440_at"
>>>> 10002 10003
>>>> "203441_s_at" "237305_at"
>>> Well, not really. This appears to be so because you are unlisting a
>>> named list. Since the names have to be unique,
>> Well, that's were I don't follow the logic behind unlist() and I've always
>> found this "feature" pretty strange. unlist() won't even make a good job
>> at
>> keeping the names unique:
>> > unlist(list(AA=letters[1:3], AA2="bb"))
>> AA1 AA2 AA3 AA2
>> "a" "b" "c" "bb"
>> So mangling the names doesn't solve anything but just adds confusion.
>>
>> IMO it would be better if unlist() was keeping the original names, even if
>> that
>> means that they are not unique in the returned vector. At least I can do
>> something
>> with it programmatically, and it's easy. With the mangled names, it's much
>> harder
>> (there are a couple of serious pitfalls).
>>
>
> The problem might originate in what one could perceive a flaw with lists
> (or any named vectors for that matter) in allowing non-unique names.
>
> Mangled names are shurely a headache, as well as the "get only the first
> element with the given name while it was not known there were several
> elements with the same name" behavior in R.
I disagree - I think that requiring unique row names in R is/was a
mistake - restrictions are often expensive - as they limit what can be
done. Yes there are issues about dealing with non-unique row names, but
those can be dealt with, by careful programming. Such methods would work
in all cases of duplicate row names, but with name-mangling schemes, one
needs to know what name mangling scheme was used to be able to
disentangle - and that means every solution is different -- not exactly
the kind of situation I would personally engineer in.
best wishes
Robert
>
>
> L.
>
>> H.
>>
>>
>>> R adds an additional
>>> integer to the end of duplicate names:
>>>
>>> > egs <- c("1", "100", "1000")
>>> > mget(egs, revmap(hgu133plus2ENTREZID))
>>> $`1`
>>> [1] "229819_at"
>>>
>>> $`100`
>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>
>>> $`1000`
>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>
>>>> There is any possibility to avoid this problem?
>>>>
>>>> 2) if in the egs vector is present an eg (6333) that is not present in
>>>> the annotation database I get the following error:
>>>> egs <- c("1", "100", "1000", "6333")
>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>
>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>> value for "6333" not found
>>>>
>>>> There is any possibility to make a query that simply avoid the
>>>> unmapped keys?
>>> Yes. The help for mget is a bit confusing on this point, but you need to
>>> use the argument ifnotfound = NA.
>>>
>>> > egs <- c("1", "100", "1000", "6333")
>>> > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>>> $`1`
>>> [1] "229819_at"
>>>
>>> $`100`
>>> [1] "1556117_at" "204639_at" "216705_s_at"
>>>
>>> $`1000`
>>> [1] "203440_at" "203441_s_at" "237305_at"
>>>
>>> $`6333`
>>> [1] NA
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Many thanks
>>>> Raffaele
>>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list