[R] Odd result

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun Sep 24 12:16:34 CEST 2023


On 23/09/2023 6:55 p.m., Parkhurst, David wrote:
> With help from several people, I used file.choose() to get my file name, and read.csv() to read in the file as KurtzData.  Then when I print KurtzData, the last several lines look like this:
> 39   5/31/22              16.0      341    1.75525 0.0201 0.0214   7.00
> 40   6/28/22  2:00 PM      0.0      215    0.67950 0.0156 0.0294     NA
> 41   7/25/22 11:00 AM      11.9   1943.5        NA     NA 0.0500   7.80
> 42   8/31/22                  0    220.5        NA     NA 0.0700  30.50
> 43   9/28/22              0.067     10.9        NA     NA 0.0700  10.20
> 44  10/26/22              0.086      237        NA     NA 0.1550  45.00
> 45   1/12/23  1:00 PM     36.26    24196        NA     NA 0.7500 283.50
> 46   2/14/23  1:00 PM     20.71       55        NA     NA 0.0500   2.40
> 47                                              NA     NA     NA     NA
> 48                                              NA     NA     NA     NA
> 49                                              NA     NA     NA     NA
> 
> Then the NA�s go down to one numbered 973.  Where did those extras likely come from, and how do I get rid of them?  I assume I need to get rid of all the lines after #46,  to do calculations and graphics, no?

Many Excel spreadsheets have a lot of garbage outside the range of the 
data.  Sometimes it is visible if you know where to look, sometimes it 
is blank cells.  Perhaps at some point you (or the file creator) 
accidentally entered a number in line 973.  Then Excel will think the 
sheet has 973 lines.  I don't know the best way to tell Excel that those 
lines are pure garbage.

That's why old fogies like me recommend that you do as little as 
possible in Excel.  Get the data into a reliable form as soon as possible.

Once it is an R dataframe, you can delete lines using negative indices. 
In this case use

     fixed <- KurtzData[-(47:nrow(KurtzData)), ]

which will create a new dataframe with only rows 1 to 46.

Duncan Murdoch



More information about the R-help mailing list