[Bioc-devel] mapping vector of gene ids to gene symbols

Wed Jun 18 16:50:50 CEST 2014

hi, thanks of the compliments to the package, i'm happy to hear you 
liked it! i must acknowledge that part of the design of the package is 
the result of conversations i had with Martin, Marc and specially 
Valerie during the review process.

i only got to know about VRanges once the package was nearly finished in 
its current form and it is in my mind to try to adapt it to that data 
structure for the reasons you comment. i haven't explored 
'ReportingTools' but i'll give a look at it.

i'll be also very glad if you use it for teaching at Brixen but
beware that this is the first version that entered the release, so it 
may have bugs that haven't been discovered yet (a list of known 
shortcomings is at the end of the vignette). do not hesitate to report 
any problem or feature request that may help using it during the course 
and i'll try to fix or add it asap. i'm coming to BioC in Boston, we can 
discuss there further directions for a better integration of 
VariantFiltering with the rest of the BioC infrastructure.

cheers,
robert.

On 06/18/2014 03:43 PM, Michael Lawrence wrote:
> Wow, VariantFiltering is awesome. It's impressive how it integrates all
> of the annotation resources so seamlessly. And the shiny app looks very
> useful. This should be a model package for where we want to bring
> Bioconductor.
>
> You should look into using VRanges instead of GRanges for the return
> value of filteredVariants(). It can record the provenance of the filters
> if you are using the FilterRules framework. And ReportingTools may be
> useful for generating semi-interactive HTML reports.
>
> I'm going to use this in the course next week at Brixen. My tutorial is
> already based on the CEU trio, so the vignette integrates perfectly.
>
> Nice work,
> Michael
>
>
> On Wed, Jun 18, 2014 at 6:20 AM, Robert Castelo <robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>> wrote:
>
>     hi Michael,
>
>     this souns like if you had a list of variants where you have
>     annotated their Entrez Gene IDs, which sometimes are NA because
>     those variance do not overlap any gene and sometimes are repeated
>     Entrez Gene IDs when two or more of those variants overlap the same
>     gene :)
>
>     at least is the situation i had when programming the
>     VariantFiltering package, i also could not find a one-liner solution
>     but you might want to look to what i ended up doing there, in case
>     it might be also useful for you.
>
>     you'll find it in the method "annotateVariants" that dispatches
>     "OrgDb" objects (i.e., gene-centric annotation packages), within
>     VariantFiltering/R/__annotationEngine.R
>
>     if you take a look at it, do not hesitate to comment if you have any
>     suggestion to improve this. i also look forward to the
>     annotation-gurus feedback on this question :)
>
>     cheers,
>
>     robert.
>
>
>     On 06/18/2014 03:03 PM, Michael Lawrence wrote:
>
>         Let's say I have a vector of gene IDs where some are NA, and are
>         some are
>         repeated, and I want to map them to gene symbols, where I get
>         NAs for the
>         NA IDs or IDs without a symbol. What is the best way to do this?
>
>         I tried select() but it gave me a table with unique entries; not
>         very
>         convenient. It also does not handle NAs. And totally breaks with
>         duplicates
>         using the GENEID key type (kind of works with ENTREZID):
>
>         select(Homo.sapiens, GENEID, "SYMBOL", "GENEID")
>         Error in `[[<-`(`*tmp*`, name, value = list(GENEID = c("245938",
>         "245939",
>         :
>             269 elements in value to replace 1312 elements
>
>         Also tried the venerable mget(GENEID, org.Hs.egSYMBOL,
>         ifnotfound=NA), but
>         this returns a list and fails with NAs.
>
>         What would be nice is something like:
>
>         map(Homo.sapiens, GENEID, "SYMBOL", "GENEID", OneToOneOrNone)
>
>         where OneToOneOrNone is an assertion that I expect the mappings
>         to be
>         one-to-one, so it will unlist() or whatever and throw an error
>         if the
>         assertion fails. It should return NA for anything not found, and
>         for any NA
>         GENEID. Does something like this already exist?
>
>         Michael
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>     --
>     Robert Castelo, PhD
>     Associate Professor
>     Dept. of Experimental and Health Sciences
>     Universitat Pompeu Fabra (UPF)
>     Barcelona Biomedical Research Park (PRBB)
>     Dr Aiguader 88
>     E-08003 Barcelona, Spain
>     telf: +34.933.160.514 <tel:%2B34.933.160.514>
>     fax: +34.933.160.550 <tel:%2B34.933.160.550>
>
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550