[R] CSV value not being read as it appears
d.scott at auckland.ac.nz
Fri Jan 14 09:59:25 CET 2011
As a further note, this is a reminder that whenever you get data via a
spreadsheet the first thing to do is examine it and clean up any
problems. A basic requirement is to tabulate any categorical variable.
Spreadsheets allow any sort of data to be entered, with no controls. My
experience is that those who enter data into spreadsheets enter all
sorts of variations of what a human would wish to treat as the same
("Open", "Open ", "open", etc.), even when told not to.
On 14/01/2011 4:03 p.m., Jim Holtman wrote:
> try strip.white=TRUE to strip out white space
> Sent from my iPad
> On Jan 13, 2011, at 21:44, bgreen at dyson.brisnet.org.au wrote:
>> I have a frustrating issue which I am hoping someone may have a suggestion
>> I am running XP and R 2.12.0 and saved an EXCEL file that I was sent as a
>> csv file.
>> The initial code I ran follows.
>> dec<- read.csv("g://FMH/FO30122010.csv",header=T)
>> dec.open<- subset (dec, Status == "Open")
>> I was checking the output and noticed a difference between my manual count
>> and R output. Two subject's rows were not being detected by the subset
>> For the AMHS where there was a discrepancy I then ran:
>> wm<- subset (dec, AMHS == "WM")
>> The problem appears to be that there is a space before the 'Open" value
>> for two indivduals, as per the example below.
>> 10/02/2010 Open
>> 22/08/2007 Open
>> Checking in EXCEL there does not appear to be a space and the format is
>> the same (e.g 'general'). I resolved the problem by copying over the
>> values for the two individuals where I identified a problem.
>> Given this problem was not detected by visual scanning I would appreciate
>> advice on how this problem can be detected in future without my having to
>> manually check raw data against R output.
>> Any assistance is appreciated,
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142, NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018
Director of Consulting, Department of Statistics
More information about the R-help