[R] Why Numeric Values Become Factors in Data Frame
David Winsemius
dwinsemius at comcast.net
Tue Nov 29 20:37:53 CET 2011
On Nov 29, 2011, at 2:18 PM, Rich Shepard wrote:
> I have a data frame with 1 factor, one date, and 37 numeric values:
> str(waterchem)
> 'data.frame': 3525 obs. of 39 variables:
> site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ...
> $ sampdate : Date, format: "2007-12-12" "2008-03-15" ...
> $ CO3 : num 1 1 6.7 1 1 1 1 1 1 1 ...
> $ HCO3 : num 231 228 118 246 157 208 338 285 260 240 ...
> $ Ca : num 100 88.4 63.4 123 78.2 103 265 213 178 166 ...
> $ DO : num 4.96 9.91 4.32 2.58 1.81 5.09 3.98 5.46 1.9
> 2.52 ...
> ...
> $ SC : Factor w/ 841 levels "1.090","10.000",..: 635 638 363
>
> All the numeric categories are read in as numbers except for some
> of those
> in column 'SC'. I have been looking in the source file for a couple
> of hours
> trying to learn why values such as 1.090 and 10.000 are seen as
> characters
> rather than numbers. I've not see the reason.
>
> The source file is 860K and looks like this:
>
> site|sampdate|'Ag'|'Al'|'CO3'|'HCO3'|'Alk-
> Tot
> '|
> 'As
> '|
> 'Ba
> '|
> 'Be
> '|
> 'Bi
> '|
> 'Ca
> '|
> 'Cd
> '|
> 'Cl
> '|'Co'|'Cr'|'Cu'|'DO'|'Fe'|'Hg'|'K'|'Mg'|'Mn'|'Mo'|'Na'|'NH4'|'NO3-
> NO2'|'Oil-
> grease'|'Pb'|'pH'|'Sb'|'SC'|'Se'|'SO4'|'Sr'|'TDS'|'Tl'|'V'|'Zn'
> 'D-1'|'2007-12-12'|0.000|0.106|1.000|231.000|231.000|0.011|0.000|
> 0.002|0.000|100.000|0.000|1.430|0.000|0.006|0.024|4.960|4.110|NA|
> 0.000|9.560|0.035|0.000|0.970|0.010|0.293|NA|0.025|7.800|0.001|
> 630.000|0.001|65.800|0.000|320.000|0.001|0.000|11.400
> 'D-1'|'2008-03-15'|0.000|0.080|1.000|228.000|228.000|0.001|0.000|
> 0.002|0.000|88.400|0.000|1.340|0.000|0.006|0.014|9.910|0.309|0.000|
> 0.000|9.150|0.047|0.000|0.820|0.224|0.020|NA|0.025|7.940|0.001|
> 633.000|0.001|75.400|0.000|300.000|0.001|0.000|12.400
>
> The R command used to create the data frame is:
> waterchem <- read.table('wqR.txt', header = TRUE, sep = '|')
>
> Pointers on how to determine why this one variable has some values
> and
> characters rather than as numerics are needed.
So what does this show?
grep("[^0-9.]", waterchem$SC)
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list