[R] Archive format
Joe Gain
joe.gain at uni-konstanz.de
Thu Mar 30 10:14:51 CEST 2017
On 29.03.2017 17:36, Jeff Newmiller wrote:
> The relevance to R (and therefore R-help) of this question is marginal at best. R might not be the language of choice when you go retrieve the data.
>
> Also, this question seems dangerously close to a troll, because the obvious answer is that the data should be in an open format but if you are not currently working with data in an open format then you increase the cost of archiving and risk losing information up front by extracting it from a proprietary format, and balancing those concerns is more political than technical.
>
> Note that there exist open binary formats, and the goals of your archiving task and nature of the data would have to be considered in deciding which of the many to use. My own experience has been that plain text survives time best, but YMMV.
>
Well, I didn't mean to troll the list. We have a small section on R, and
in response to a question that we got from a user, we thought it would
be a good idea to check with some actual R-users.
I think the responses are pretty much in line with what we expected.
There's unsurprisingly no simple solution. A text format is advantageous
due to the many options that a user has to work with text data. Your
point is valid, with regards to the format of the source-data, which can
be a clear constraint (other constraints are, for example, of a legal
nature). I'm not trying to advocate for open formats per se, just trying
to gather information so as to be able to make a recommendation.
I think we need to restructure the information on our web platform to
clearly differentiate between data and the source code, scripts etc.
which are used to process the data ("algorithms").
There is a big problem with data that has been archived but nobody knows
what it is/ was for. Archivation, sharing, reproducibility are important
subjects and we are interested in the experience of statisticians in
dealing with these problems.
Thanks for the replies!
Joe
--
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz
t: ++49-7531-883234
e: joe.gain at uni-konstanz.de
More information about the R-help
mailing list