[R] Data cleaning & Data preparation, what do R users want?

Bert Gunter bgunter.4567 at gmail.com
Wed Nov 29 17:48:04 CET 2017

I don't think my view is of interest to many, so offlist.

I reject this:

" I would consider data analysis work to be three stages: data preparation,
statistical analysis, and producing the report."

For example, there is no such thing as "outliers" -- data to be removed as
part of cleaning/preparation -- without a statistical model to be an
"outlier" **from**, which is part of the statistical analysis. And the
structure of the data (data preparation) may need to change depending on
the course of the analysis (including graphics, also part of the analysis).
So I think your view reflects a naïve view of the nature of data analysis,
which is an iterative and holistic process. I suspect your training is as a
computer scientist and you have not done much 1-1 consulting with
researchers, though you should certainly feel free to reject this canard.
Building software for large scale automated analysis of data required a
much different analytical paradigm than the statistical consulting model,
which is largely my background.

No reply necessary. Just my opinion, which you are of course free to trash.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Nov 29, 2017 at 8:37 AM, Robert Wilkins <iwritecode2 at gmail.com>

> R has a very wide audience, clinical research, astronomy, psychology, and
> so on and so on.
> I would consider data analysis work to be three stages: data preparation,
> statistical analysis, and producing the report.
> This regards the process of getting the data ready for analysis and
> reporting, sometimes called "data cleaning" or "data munging" or "data
> wrangling".
> So as regards tools for data preparation, speaking to the highly diverse
> audience mentioned, here is my question:
> What do you want?
> Or are you already quite happy with the range of tools that is currently
> before you?
> [BTW,  I posed the same question last week to the r-devel list, and was
> advised that r-help might be a more suitable audience by one of the
> moderators.]
> Robert Wilkins
>         [[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

More information about the R-help mailing list