[R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ?

Thierry Onkelinx th|erry@onke||nx @end|ng |rom |nbo@be
Tue Aug 20 16:25:25 CEST 2024


Dear Ben,

This is as simple as setting mandatory given and family fields.
checklist::check_description() ensures that given and family are set unless
the role is "cph" or "fnd". Allowing for organisations to be listed with
only the given field.

The 0.4.1 branch of checklist
<https://github.com/inbo/checklist/tree/0.4.1> now
exports the author2df() function which now can handle objects of call
person, list, logical (NA) and NULL. Feedback is welcome.

library(checklist)
df <- tools::CRAN_package_db()
vapply(
  df$`Authors using R`[df$Package %in% c("git2rdata", "A3", "digest", "abe")],
  function(x) {
    parse(text = x) |>
      eval() |>
      list()
  },
  vector(mode = "list", 1)
) |>
  unname() |>
  author2df()

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
*Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel
*Poststukken die naar dit adres worden gestuurd, worden ingescand en
digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar
dossiers volledig digitaal behandelen. Poststukken met de vermelding
‘vertrouwelijk’ worden niet ingescand, maar ongeopend aan de geadresseerde
bezorgd.*
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op di 20 aug 2024 om 15:59 schreef Ben Bolker <bbolker using gmail.com>:

>   Looking into one particular example,
>
> https://github.com/seabbs/idmodelr/blob/master/DESCRIPTION
>
> this appears to be the authors' fault:
>
> Authors using R: c(
>      person(given = "Sam Abbott",
>             role = c("aut", "cre"),
>             email = "contact using samabbott.co.uk",
>             comment = c(ORCID = "0000-0001-8057-8037")),
>      person(given = "Akira Endo",
>             role = c("aut"),
>             email = "akira.endo using lshtm.ac.uk",
>             comment = c(ORCID = "0000-0001-6377-7296")))
>
>    Maybe CRAN should start checking for missing 'family' fields in
> Authors using R ... ???
>
>    cheers
>     Ben Bolker
>
> On 2024-08-20 9:47 a.m., Kurt Hornik wrote:
> >>>>>> Kurt Hornik writes:
> >
> > The variant attaches drops the URL and does unique.
> >
> > Hmm, the ones in
> >
> >    head(with(a, sort_by(a, ~ family + given)), 100)
> >
> > without a family look suspicious ...
> >
> > Best
> > -k
> >
> >
> >
> >
> >>>>>> Dirk Eddelbuettel writes:
> >>> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote:
> >>> |
> >>> | Hi Kurt,
> >>> |
> >>> | On 20 August 2024 at 14:29, Kurt Hornik wrote:
> >>> | | I think for now you could use something like what I attach below.
> >>> | |
> >>> | | Not ideal: I had not too long ago starting adding orcidtools.R to
> tools,
> >>> | | which e.g. has .persons_from_metadata(), but that works on the
> unpacked
> >>> | | sources and not the CRAN package db.  Need to think about that ...
> >>> |
> >>> | We need something like that too as I fat-fingered the string
> 'ORCID'. See
> >>> | fortune::fortunes("Dirk can type").
> >>> |
> >>> | Will the function below later. Many thanks for sending it along.
> >
> >>> Very nice. Resisted my common impulse to make it a data.table for easy
> >>> sorting via keys etc.  After running your code the line
> >
> >>> head(with(a, sort_by(a, ~ family + given)), 100)
> >
> >>> shows that we need a bit more QA as person entries are not properly
> split
> >>> between 'family' and 'given', use the URL and that we have repeats.
> >>> Excluding those is next.
> >
> >> Right.  One should canonicalize the ORCID (having the URLs is from being
> >> nice) and then do unique() ...
> >
> >> Best
> >> -k
> >
> >
> >
> >>> Dirk
> >
> >>> | Dirk
> >>> |
> >>> | |
> >>> | | Best
> >>> | | -k
> >>> | |
> >>> | |
> ********************************************************************
> >>> | | x <- tools::CRAN_package_db()
> >>> | | a <- lapply(x[["Authors using R"]],
> >>> | |             function(a) {
> >>> | |                 if(!is.na(a)) {
> >>> | |                     a <-
> tryCatch(utils:::.read_authors_at_R_field(a),
> >>> | |                                   error = identity)
> >>> | |                     if (inherits(a, "person"))
> >>> | |                         return(a)
> >>> | |                 }
> >>> | |                 NULL
> >>> | |             })
> >>> | | a <- do.call(c, a)
> >>> | | a <- lapply(a,
> >>> | |             function(e) {
> >>> | |                 if(is.null(o <- e$comment["ORCID"]) || is.na(o))
> >>> | |                     return(NULL)
> >>> | |                 cbind(given = paste(e$given, collapse = " "),
> >>> | |                       family = paste(e$family, collapse = " "),
> >>> | |                       oid = unname(o))
> >>> | |             })
> >>> | | a <- as.data.frame(do.call(rbind, a))
> >>> | |
> ********************************************************************
> >>> | |
> >>> | | > Salut Thierry,
> >>> | |
> >>> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote:
> >>> | | > | Happy to help. I'm working on a new version of the checklist
> package. I could
> >>> | | > | export the function if that makes it easier for you.
> >>> | |
> >>> | | > Would be happy to help / iterate. Can you take a stab at making
> the
> >>> | | > per-column split more robust so that we can bulk-process all
> non-NA entries
> >>> | | > of the returned db?
> >>> | |
> >>> | | > Best, Dirk
> >>> | |
> >>> | | > --
> >>> | | > dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
> >>> |
> >>> | --
> >>> | dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
> >
> >>> --
> >>> dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
> >>>
> >>> ______________________________________________
> >>> R-package-devel using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list