[R] Data cleaning & Data preparation, what do R users want?

Robert Wilkins iwritecode2 at gmail.com
Wed Nov 29 18:08:24 CET 2017


OK, well what about a range of functions in an R package that
automatically, with very little syntax, pulls in data from a variety of
formats (CSV, SQLite, and so on) and converts them to an R data frame. You
seem to be pointing to something like that.
Something like that, in some form or another, probably already exists,
though it might be either imperfect (not as user-friendly as possible) or
not well publicised, or both.
Or another tangent: your co-workers are not going to stop using Excel,
whether you like it or not, and many end-users are stuck in the exact same
position as you (co-workers who deliver the data in Excel). I will guess
that data stored in Excel tends to be dirty in somewhat predictable ways.
(And again, those other end-user's coworkers are not going to change their
behaviour). And so: a data munging tool that makes it as easy as possible
to clean up the data in Excel spreadsheets and export them to R data
frames. One prerequisite: an understanding of what tends to go wrong with
data with Excel ( the data in Excel tends to be dirty, but dirty in what

Thank you for your response Christopher. What state are you in?

On Wed, Nov 29, 2017 at 11:52 AM, Christopher W. Ryan <cryan at binghamton.edu>

> Great question. What do I want? I want my co-workers to stop using Excel
> spreadsheets for data entry, storage, and sharing! I want them to
> understand the value of data discipline. But alas . . . .
> I work in a county health department in the US. Between dplyr, stringr,
> grep, grepl, and the base R read() functions, I'm doing OK.
> I need to learn more about APIs, so I can see if I can make R directly
> grab data from, e.g. our state health department sources. My biggest
> hassle is having to download a data file, save it somewhere, and then
> open R and read it in. I'd like to be able to do it all in R. Would make
> the generation of recurring reports easier.
> --Chris Ryan
> Robert Wilkins wrote:
> > R has a very wide audience, clinical research, astronomy, psychology, and
> > so on and so on.
> > I would consider data analysis work to be three stages: data preparation,
> > statistical analysis, and producing the report.
> > This regards the process of getting the data ready for analysis and
> > reporting, sometimes called "data cleaning" or "data munging" or "data
> > wrangling".
> >
> > So as regards tools for data preparation, speaking to the highly diverse
> > audience mentioned, here is my question:
> >
> > What do you want?
> > Or are you already quite happy with the range of tools that is currently
> > before you?
> >
> > [BTW,  I posed the same question last week to the r-devel list, and was
> > advised that r-help might be a more suitable audience by one of the
> > moderators.]
> >
> > Robert Wilkins
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

	[[alternative HTML version deleted]]

More information about the R-help mailing list