[Rd] [RFC] readtable enhancement

Kurt Van Dijck dev@kurt @end|ng |rom v@nd|jck-|@ur|j@@en@be
Thu Mar 28 06:33:18 CET 2019


In the meantime, I submitted a bug. Thanks for the assistence on that.

>    and I'm not convinced that
>    coercion failures should fallback gracefully to the default.

the gracefull fallback:
- makes the code more complex
+ keeps colConvert implementations limited
+ requires the user to only implement what changed from the default
+ seemed to me to smallest overall effort

In my opinion, gracefull fallback makes the thing better,
but without it, the colConvert parameter remains usefull, it would still
fill a gap.

>    The implementation needs work though,

Other than to remove the gracefull fallback?

Kind regards,

On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote:
>    This has some nice properties:
>    1) It self-documents the input expectations in a similar manner to
>    colClasses.
>    2) The implementation could eventually "push down" the coercion, e.g.,
>    calling it on each chunk of an iterative read operation.
>    The implementation needs work though, and I'm not convinced that
>    coercion failures should fallback gracefully to the default.
>    Feature requests fall under a "bug" in bugzilla terminology, so please
>    submit this there. I think I've made you an account.
>    Thanks,
>    Michael
>    On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck
>    <[1]dev.kurt using vandijck-laurijssen.be> wrote:
>      Thank you for your answers.
>      I rather do not file a new bug, since what I coded isn't really a
>      bug.
>      The problem I (my colleagues) have today is very stupid:
>      We read .csv files with a lot of columns, of which most contain
>      date-time stamps, coded in DD/MM/YYYY HH:MM.
>      This is not exotic, but the base library's readtable (and
>      derivatives)
>      only accept date-times in a limited number of possible formats
>      (which I
>      understand very well).
>      We could specify a format in a rather complicated format, for each
>      column individually, but this syntax is rather difficult to
>      maintain.
>      My solution to this specific problem became trivial, yet generic
>      extension to read.table.
>      Rather than relying on the built-in type detection, I added a
>      parameter
>      to a function that will be called for each to-be-type-probed column
>      so I
>      can overrule the built-in limited default.
>      If nothing returns from the function, the built-in default is still
>      used.
>      This way, I could construct a type-probing function that is
>      straight-forward, not hard to code, and makes reading my .csv files
>      acceptible in terms of code (read.table parameters).
>      I'm sure I'm not the only one dealing with such needs, escpecially
>      date-time formats exist in enormous amounts, but I want to stress
>      here
>      that my approach is agnostic to my specific problem.
>      For those asking to 'show me the code', I redirect to my 2nd patch,
>      where the tests have been extended with my specific problem.
>      What are your opinions about this?
>      Kind regards,
>      Kurt

More information about the R-devel mailing list