[R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ?

Kurt Hornik Kurt@Horn|k @end|ng |rom wu@@c@@t
Tue Aug 20 15:47:22 CEST 2024


>>>>> Kurt Hornik writes:

The variant attaches drops the URL and does unique.

Hmm, the ones in

  head(with(a, sort_by(a, ~ family + given)), 100)

without a family look suspicious ...

Best
-k


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: orcid.R
URL: <https://stat.ethz.ch/pipermail/r-package-devel/attachments/20240820/76546959/attachment.ksh>

-------------- next part --------------


>>>>> Dirk Eddelbuettel writes:
>> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote:
>> | 
>> | Hi Kurt,
>> | 
>> | On 20 August 2024 at 14:29, Kurt Hornik wrote:
>> | | I think for now you could use something like what I attach below.
>> | | 
>> | | Not ideal: I had not too long ago starting adding orcidtools.R to tools,
>> | | which e.g. has .persons_from_metadata(), but that works on the unpacked
>> | | sources and not the CRAN package db.  Need to think about that ...
>> | 
>> | We need something like that too as I fat-fingered the string 'ORCID'. See
>> | fortune::fortunes("Dirk can type").
>> | 
>> | Will the function below later. Many thanks for sending it along.

>> Very nice. Resisted my common impulse to make it a data.table for easy
>> sorting via keys etc.  After running your code the line

>> head(with(a, sort_by(a, ~ family + given)), 100)

>> shows that we need a bit more QA as person entries are not properly split
>> between 'family' and 'given', use the URL and that we have repeats.
>> Excluding those is next.

> Right.  One should canonicalize the ORCID (having the URLs is from being
> nice) and then do unique() ...

> Best
> -k



>> Dirk
 
>> | Dirk
>> | 
>> | | 
>> | | Best
>> | | -k
>> | | 
>> | | ********************************************************************
>> | | x <- tools::CRAN_package_db()
>> | | a <- lapply(x[["Authors using R"]],
>> | |             function(a) {
>> | |                 if(!is.na(a)) {
>> | |                     a <- tryCatch(utils:::.read_authors_at_R_field(a), 
>> | |                                   error = identity)
>> | |                     if (inherits(a, "person")) 
>> | |                         return(a)
>> | |                 }
>> | |                 NULL
>> | |             })
>> | | a <- do.call(c, a)
>> | | a <- lapply(a,
>> | |             function(e) {
>> | |                 if(is.null(o <- e$comment["ORCID"]) || is.na(o))
>> | |                     return(NULL)
>> | |                 cbind(given = paste(e$given, collapse = " "),
>> | |                       family = paste(e$family, collapse = " "),
>> | |                       oid = unname(o))
>> | |             })
>> | | a <- as.data.frame(do.call(rbind, a))
>> | | ********************************************************************
>> | | 
>> | | > Salut Thierry,
>> | | 
>> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote:
>> | | > | Happy to help. I'm working on a new version of the checklist package. I could
>> | | > | export the function if that makes it easier for you.
>> | | 
>> | | > Would be happy to help / iterate. Can you take a stab at making the
>> | | > per-column split more robust so that we can bulk-process all non-NA entries
>> | | > of the returned db?
>> | | 
>> | | > Best, Dirk
>> | | 
>> | | > -- 
>> | | > dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
>> | 
>> | -- 
>> | dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org

>> -- 
>> dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org


More information about the R-package-devel mailing list