[R-pkg-devel] help/advice on debugging

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Sun Jul 10 17:54:16 CEST 2022


On Sat, 9 Jul 2022 16:29:57 -0400
Ben Bolker <bbolker using gmail.com> wrote:

>    The problem is in vignette rebuilding, errors of this form in both
> of the package vignettes:
> 
>    Can't join on `x$entrezgene_id` x `y$entrezgene_id` because of
>    incompatible types.
>    ℹ `x$entrezgene_id` is of type <double>>.
>    ℹ `y$entrezgene_id` is of type <character>>.

I'd hazard a guess that both vignettes crash in a call to
entrez_to_symbol (a direct one or via rename_genes). Specifically, its
first argument (`x`) is converted to numeric, then the following
happens:

    df <- data.frame(entrezgene_id = x)
    df <- dplyr::left_join(df, gene_info, by = "entrezgene_id")

gene_info is obtained above, using the following:

  gene_info <- get_biomart_mapping(species, symbol_name, dir_save,
                                   verbose) %>%
    dplyr::group_by(entrezgene_id) %>%
    dplyr::summarise(dplyr::across(dplyr::everything(), dplyr::first))

get_biomart_mapping accesses the Internet using biomaRt::getBM if it
can, but otherwise uses a copy of the information for human genome
cached inside the package.

There doesn't seem to be any mention of special cases for
"entrezgene_id" in the code of the biomaRt package. biomaRt::getBM
POSTs XML queries to ensembl.org/biomart/martservice?... and parses the
resulting tab-separated values using read.table.

My guess is, ensembl.org started returning something that isn't a
number in the entrezgene_id column, and you were the first one to
rebuild the vignette and notice that.

-- 
Best regards,
Ivan



More information about the R-package-devel mailing list