[R] Cleaning up messy Excel data

Robert Baer rbaer at atsu.edu
Tue Feb 28 22:38:49 CET 2012


-----Original Message----- 
From: Noah Silverman
Sent: Tuesday, February 28, 2012 3:27 PM
To: r-help
Subject: [R] Cleaning up messy Excel data

Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can 
do my analysis.  Pulling in a CSV from Excel is the easy part.

My current challenge is dealing with some text mixed in the values.
i.e.   118   5.7   <2.0  3.7

Since this column in Excel has a "<2.0" value, then R reads the column as a 
factor with levels.  Ideally, I want to convert it a normal vector of 
scalars and code code the "<2.0" as 0.

Can anyone suggest an easy way to do this?
--------------------------------------
?as.character
will show you how to change the "factor" column into a character column. 
Then, you can replace text using any of a number of procedures.
see for example
?gsub

finally, you can use as.numeric if you want numbers.  "Coding" is best done 
in the context of factors, so you might want to consider where replacing  <2 
with NA is more appropriate than replacing with 0.  In this end, the choice 
might be context sensitive.

Rob
--------------------------------
------------------------------------------
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965



More information about the R-help mailing list