[R] Calculate NAs from known data: how to?
Brian G. Peterson
brian at braverock.com
Tue Oct 17 13:48:53 CEST 2006
Torleif Markussen Lunde wrote:
> In a dataset I have length and age for cod. The age, however, is ony
> given for 40-100% of the fish. What I need to do is to fill inn the NAs
> in a correct way, so that age has a value for each length. This is to be
> done for each sample seperately (there are 324 samples), meaning the NAs
> for sampleno 1 shall be calculated from the known values from sampleno
1.
>
> As for example length 55 cm can be both 4 and 5 years, I guess a fish
> with NA age and length 55 cm should be given a "random" age given a
> probability for example "55 cm = 4 years has a p=75%, while 55 cm = 4
> years has a p=25%". Those "p-values" should be calculated from the real
> data.
>
> How can this be done in R, and what is the right way to do it?
Given the size of your sample, wouldn't it be more statistically valid to
set the age of the NA records to the mean age of records of matching
length? I suppose you could also use resampling or a bootstrap, but I'm
not sure that adding randomization will give results that are any more
statistically valid than using the mean.
Regards,
- Brian
More information about the R-help
mailing list