[R] Tools For Preparing Data For Analysis

Dale Steele Dale_Steele at brown.EDU
Fri Jun 8 19:49:16 CEST 2007


For windows users, EpiData Entry <http://www.epidata.dk/> is an
excellent (free) tool for data entry and documentation.    --Dale


On 6/8/07, Chris Evans <chrishold at psyctc.org> wrote:
>
> Martin Henry H. Stevens sent the following  at 08/06/2007 15:11:
> > Is there an example available of this sort of problematic data that
> > requires this kind of data screening and filtering? For many of us,
> > this issue would be nice to learn about, and deal with within R. If a
> > package could be created, that would be optimal for some of us. I
> > would like to learn a tad more, if it were not too much effort for
> > someone else to point me in the right direction?
> > Cheers,
> > Hank
> > On Jun 8, 2007, at 8:47 AM, Douglas Bates wrote:
> >
> >> On 6/7/07, Robert Wilkins <irishhacker at gmail.com> wrote:
> >>> As noted on the R-project web site itself ( www.r-project.org ->
>
> ... rest snipped ...
>
> OK, I can't resist that invitation.  I think there are many kinds of
> problematic data.  I handle some nasty textish things in perl (and I
> loved the purgatory quote) and I'm afraid I do some things in Excel and
> some cleaning I can handle in R, but I never enter data directly into R.
>
> However, one very common scenario I have faceda all my working life is
> psych data from questionnaires or interviews in low budget work, mostly
> student research or routine entry of therapists' data.  Typically you
> have an identifier, a date, some demographics and then a lot of item
> data.  There's little money (usual zero) involved for data entry and
> cleaning but I've produced a lot of good(ish) papers out of this sort of
> very low budget work over the last 20 years.  (Right at the other end of
> a financial spectrum from the FDA/validated s'ware thread but this is
> about validation again!)
>
> The problem I often face is that people are lousy data entry machines
> (well, actually, they vary ... enormously) and if they mess up the data
> entry we all know how horrible this can be.
>
> SPSS (boo hiss) used to have an excellent "module", actually a
> standalone PC/Windoze program, that allowed you to define variables so
> they had allowed values and it would refuse to accept out of range or
> out of acceptable entries, it also allowed you to create checking rules
> and rules that would, in the light of earlier entries, set later values
> and not ask about them.  In a rudimentary way you could also lay things
> out on the screen so that it paginated where the q'aire or paper data
> record did etc.  The final nice touch was that you could define some
> variables as invariant and then set the thing so an independent data
> entry person could re-enter the other data (i.e. pick up q'aire, see if
> ID fits the one showing on screen, if so, enter the rest of the data).
> It would bleep and not move on if you entered a value other than that
> entered by the first person and you had to confirm that one of you was
> right.
>
> That saved me wasted weeks I'm sure on analysing data that turned out to
> be awful and I'd love to see someone build something to replace that.
>
> Currently I tend to use (boo hiss) Excel for this as everyone I work
> with seems to have it (and not all can install open office and anyway I
> haven't had time to learn that properly yet either ...) and I set up
> spreadsheets with validation rules set.  That doesn't get the branching
> rules and checks (e.g. if male, skip questions about periods, PMT and
> pregnancies), or at least, with my poor Excel skills it doesn't.  I just
> skip a column to indicate page breaks in the q'aire, and I get, when I
> can, two people to enter the data separately and then use R to compare
> the two spreadsheets having yanked them into data frames.
>
> I would really, really love someone to develop (and perhaps replace) the
> rather buggy edit() and fix() routines (seem to hang on big data frames
> in Rcmdr which is what I'm trying to get students onto) with something
> that did some or all of what SPSS/DE used to do for me or I bodge now in
> Excel.  If any generous coding whiz were willing to do this, I'll try to
> alpha and beta test and write help etc.
>
> There _may_ be good open source things out there that do what I need but
> something that really integrated into R would be another huge step
> forward in being able to phase out SPSS in my work settings and phase in R.
>
> Very best all,
>
> Chris
>
>
>
> --
> Chris Evans <chris at psyctc.org> Skype: chris-psyctc
> Professor of Psychotherapy, Nottingham University;
> Consultant Psychiatrist in Psychotherapy, Notts PDD network;
> Research Programmes Director, Nottinghamshire NHS Trust;
> *If I am writing from one of those roles, it will be clear. Otherwise*
> *my views are my own and not representative of those institutions    *
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list