[R] Cleaning up messy Excel data

jim holtman jholtman at gmail.com
Sat Mar 3 02:04:05 CET 2012


Unfortunately they only know how to use Excel and Word.  They are not
folks who use a computer every day.  Many of them run factories or
warehouses and asking them to use something like Access would not
happen in my lifetime (I have retired twice already).

I don't have any problems with them "messing" up the data that I send
them; they are pretty good about making changes within the context of
the spreadsheet.  The other issue is that I working with people in
twenty different locations spread across the US, so I might be able to
one of them to use Access (there is one I know that uses it), but that
leaves 19 other people I would not be able to communicate with.

The other thing is, is that I use Excel myself to slice/dice data
since there are things that are easier in Excel than R (believe it or
not).  There are a number of tools I keep in my toolkit, and R is
probably the most important, but I have not thrown the rest of them
away since they still serve a purpose.

So if you can come up with a way to 20 diverse groups, who are not
computer literate, to change over in a couple of days from Excel to
Access let me know.  BTW, I tried to use Access once and gave it up
because it was not as intuitive as some other tools and did not give
me any more capability than the ones I was using.  So I know I would
have a problem in convincing other to make the change just so they
could communicate with me, while they still had to use Excel to most
of their other interfaces.

This is the real world where you have to learn how to adapt to your
environment and make the best of it.  So you just have to learn that
Excel can be your friend (or at least not your enemy) and can serve a
very useful purpose in getting your ideas across to other people.

On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow <538280 at gmail.com> wrote:
> Try sending your clients a data set (data frame, table, etc) as an MS
> Access data table instead.  They can still view the data as a table,
> but will have to go to much more effort to mess up the data, more
> likely they will do proper edits without messing anything up (mixing
> characters in with numbers, have more sexes than your biology teacher
> told you about, add extra lines at top or bottom that makes reading
> back into R more difficult, etc.)
>
> I have had a few clients that I talked into using MS Access from the
> start to enter their data, there was often a bit of resistance at
> first, but once they tried it and went through the process of
> designing the database up front they ended up thanking me and believed
> that the entire data entry process was easier and quicker than had the
> used excel as they originally planned.
>
> Access is still part of MS office, so they don't need to learn R or in
> any way break their chains from being prisoners of bill, but they will
> be more productive in more ways than just interfacing with you.
>
> Access (databases in general) force you to plan things out and do the
> correct thing from the start.  It is possible to do the right thing in
> Excel, but Excel does not encourage (let alone force) you to do the
> right thing, but makes it easy to do the wrong thing.
>
> On Thu, Mar 1, 2012 at 6:15 AM, jim holtman <jholtman at gmail.com> wrote:
>> But there are some important reasons to use Excel.  In my work there
>> are a lot of people that I have to send the equivalent of a data.frame
>> to who want to look at the data and possibly slice/dice the data
>> differently and then send back to me updates.  These folks do not know
>> how to use R, but do have Microsoft Office installed on their
>> computers and know how to use the different products.
>>
>> I have been very successful in conveying what I am doing for them by
>> communicating via Excel spreadsheets.  It is also an important medium
>> in dealing with some international companies who provide data via
>> Excel and expect responses back via Excel.
>>
>> When dealing with data in a tabular form, Excel does provide a way for
>> a majority of the people I work with to understand the data.  Yes,
>> there are problems with some of the ways that people use Excel, and
>> yes I have had to invest time in scrubbing some of the data that I get
>> from them, but if I did not, then I would probably not have a job
>> working for them.  I use R exclusively for the analysis that I do, but
>> find it convenient to use Excel to provide a communication mechanism
>> to the majority of the non-R users that I have to deal with.  It is a
>> convenient "work-around" because I would never get them to invest the
>> time to learn R.
>>
>> So in the real world these is a need to Excel and we are not going to
>> cause it to go away; we have to learn how to live with it, and from my
>> standpoint, it has definitely benefited me in being able to
>> communicate with my users and continuing to provide them with results
>> that they are happy with.  They refer to letting me work my "magic" on
>> the data; all they know is they see the result via Excel and in the
>> background R is doing the heavy lifting that they do not have to know
>> about.
>>
>> On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner <rolf.turner at xtra.co.nz> wrote:
>>> On 01/03/12 04:43, John Kane wrote:
>>>>
>>>> (mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
>>>> str(mydata)
>>>>
>>>> newdata<- as.character(mydata)
>>>>
>>>> newdata[newdata==">2"]<- 0
>>>> newdata<- as.numeric(newdata)
>>>> str(newdata)
>>>>
>>>> We really need to keep Excel (and other spreadsheets) out of peoples
>>>> hands.
>>>
>>>
>>> Amen, bro'!!!
>>>
>>>    cheers,
>>>
>>>        Rolf Turner
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list