[R] 'mean' and 'sd' calculations do not match
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Dec 8 14:43:58 CET 2005
Ulrich Leopold <uleopold at science.uva.nl> writes:
> Dear list,
>
> I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC.
>
> If I compute the aggregated mean and the standard deviation I get
> standard deviation values for factors where the mean was not computed.
> It seems to me that this is somehow related to the NA values. But I
> don't quite understand what is going wrong?
You're using na.rm=TRUE on the sd calculation, but not on the means!
(The NA's generated for sd are likely groups with only one observation).
> Could it be related to the data import already? Some of the imported
> data got the character strings NA and others <NA>. But they are defined
> from the same values, -9999.
No. It signifies a problem, but not this one. The <NA> is used for
factor and character columns. Most likely (can't think of any other
reason) some of your data are not numeric - "," instead of "." and
similar typos will do that to you.
> I used the code below. Below the code are parts of the results.
>
> Cheers, Ulrich
>
> Data import:
>
> chemicS <- read.table("ChemieUlli_4_Quellen.csv", header = TRUE, sep =
> ",",na.strings = "-9999")
>
> Count EC NO3 NO2 NH4
> 3504 630.0000 33.00 0.001 0.01
> 3505 NA 26.66 <NA> <NA>
> 3506 NA 0.72 <NA> <NA>
> 3507 NA NA <NA> <NA>
> 3508 NA NA <NA> <NA>
> 3509 NA NA <NA> <NA>
> 3510 1210.0000 14.00 0.001 0.01
> 3511 1265.0000 12.00 0.001 0.01
> 3512 1400.0000 14.00 0.001 0.01
> 3513 1427.0000 12.00 0.001 0.01
> 3514 1410.0000 7.00 0 0
> 3515 1520.0000 8.00 0.001 0.01
> 3516 1470.0000 7.60 0 0
> 3517 1170.0000 10.00 0.001 0.01
> 3518 4570.0000 20.00 0.001 0.45
> 3519 8560.0000 0.50 0.14 0.31
> 3520 708.0000 39.00 0.001 0.01
> 3521 833.0000 40.00 0.01 0.01
> 3522 NA NA <NA> <NA>
>
> Computing the mean:
>
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = mean)
>
> Count east north Mean
> 350 89885 103160 318.50000
> 351 55870 103510 400.00000
> 352 82570 104845 637.33333
> 353 79119 107433 NA
> 354 79160 107462 362.77778
> 355 83010 108990 NA
> 356 82810 109010 NA
> 357 69135 112992 NA
> 358 55490 120140 142.25000
> 359 56580 120600 NA
> 360 56582 120607 NA
> 361 58050 125350 NA
> 362 58059 125360 NA
> 363 60360 128191 NA
> 364 65448 128293 252.50000
> 365 65472.5 128308.1 NA
> 366 61412 131141 NA
>
> Computing the standard deviation:
>
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = sd, na.rm = TRUE)
>
> Count east north Stdev.
> 350 89885 103160 4.9497475
> 351 55870 103510 NA
> 352 82570 104845 19.6553640
> 353 79119 107433 NA
> 354 79160 107462 73.6745848
> 355 83010 108990 NA
> 356 82810 109010 15.6950098
> 357 69135 112992 NA
> 358 55490 120140 5.3150729
> 359 56580 120600 NA
> 360 56582 120607 22.4435801
> 361 58050 125350 NA
> 362 58059 125360 23.3108523
> 363 60360 128191 20.9789577
> 364 65448 128293 10.6066017
> 365 65472.5 128308.1 NA
> 366 61412 131141 8.6184556
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list