[R] assigning creating missing rows and values
Bert Gunter
gunter.berton at gene.com
Thu May 12 23:13:05 CEST 2011
... But beware: Last observation carried forward is a widely used but
notoriously bad (biased) way to impute missing values; and, of course,
inference based on such single imputation is bogus (how bogus depends
on how much imputation, among other things, of course).
Unfortunately, dealing with such data "well" requires considerable
statistical sophistication, which is why statisticians are widely
employed in the clinical trial business, where missing data in
longitudinal series are relatively common. You may therefore find it
useful to consult a local statistician if one is available.
As an extreme -- and unrealistic -- example of the problem, suppose
your series consisted of 12 hours of data measured every half hour and
that one series had only two measurements, the first and the last. The
first value is 10 and the last is 1. LOCF would fill in the missings
as all 10's. Obviously, a dumb thing to do. For real data, the problem
would not be so egregious, but the fundamental difficulty is the
same.
(Apologies to those for whom my post is a familiar, boring refrain.
Unfortunately, I do not have the imagination to offer better).
Cheers,
Bert
On Thu, May 12, 2011 at 1:43 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On May 12, 2011, at 4:33 PM, Schatzi wrote:
>
>> I have a dataset where I have missing times (11:00 and 16:00). I would
>> like
>> the outputs to include the missing time so that the final time vector
>> looks
>> like "realt" and has the previous time's value. Ex. If meas at time 15:30
>> is
>> 0.45, then the meas for time 16:00 will also be 0.45.
>> meas are the measurements and times are the times at which they were
>> taken.
>>
>> meas<-runif(18)
>>
>> times<-c("08:30","09:00","09:30","10:00","10:30","11:30","12:00","12:30","13:00","13:30","14:00","14:30","15:00",
>> "15:30" ,"16:30","17:00","17:30","18:00")
>> output<-data.frame(meas,times)
>>
>> realt<-c("08:30","09:00","09:30","10:00","10:30","11:00","11:30","12:00","12:30","13:00","13:30","14:00","14:30","15:00","15:30","16:00","16:30","17:00","17:30","18:00")
>
> Package 'zoo' has an 'na.locf' function which I believe stands for "NA's
> last observation carried forward". So make a regular set of times, merge and
> "carry forward". I'm pretty sure you can find may examples in the Archive.
> Gabor is very good about spotting places where his many contributions can be
> successfully deployed.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."
-- Maimonides (1135-1204)
Bert Gunter
Genentech Nonclinical Biostatistics
More information about the R-help
mailing list