[R] about interpolating data in r
lily li
chocold12 at gmail.com
Fri Jul 22 17:29:14 CEST 2016
Thanks, Ismail.
For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to
fill in the missing values for column C. There is no relationship between
column A, B, and C.
For the missing values between 2009-01-05 and 2009-11-20, if there are any,
I found this approach is very helpful.
with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <sezenismail at gmail.com> wrote:
>
> > On 22 Jul 2016, at 01:34, lily li <chocold12 at gmail.com> wrote:
> >
> > I have a question about interpolating missing values in a dataframe.
>
> First of all, filling missing values action must be taken into account
> very carefully. It must be known the nature of the data that wanted to be
> filled and most of the time, to let them be NA is the most appropriate
> action.
>
> > The
> > dataframe is in the following, Column C has no data before 2009-01-05 and
> > after 2009-12-31, how to interpolate data for the blanks?
>
> Why a dataframe? Is there any relationship between columns A,B and C? If
> there is, then you might want to consider filling missing values by a
> linear model approach instead of interpolation. You said that there is not
> data before 2009-01-05 and after 2009-12-31 but according to dataframe,
> there is not data after 2009-11-20?
>
> > That is to say,
> > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>
> Also you metion interpolating blanks but you want interpolation between
> two gaps? Do you want to fill missing values before 2009-01-05 and after
> 2009-11-20 or do you want to find intermediate values between 2009-01-05
> and 2009-11-20? This is a bit unclear.
>
> >
> >
> > df
> > time A B C
> > 2009-01-01 3 4.5
> > 2009-01-02 4 5
> > 2009-01-03 3.3 6
> > 2009-01-04 4.1 7
> > 2009-01-05 4.4 6.2 5.4
> > ...
> >
> > 2009-11-20 5.1 5.5 6.1
> > 2009-11-21 5.4 4
> > ...
> > 2009-12-31 4.5 6
>
>
> If you want to fill missing values at the end-points for column C (before
> 2009-01-05 and after 2009-11-20), and all data you have is between
> 2009-01-05 and 2009-11-20, this means that you want extrapolation (guessing
> unkonwn values that is out of known values). So, you can use only values at
> column C to guess missing end-point values. You can use splinefun (or
> spline) functions for this purpose. But let me note that this kind of
> approach might help you only for a few missing values close to end-points.
> Otherwise, you might find yourself in a huge mistake.
>
> As I mentioned in my first sentence, If you have a relationship between
> all columns or you have data for column C for other years (for instance,
> assume that you have data for column C for 2007, 2008, and 2010 but not
> 2009) you may want to try a statistical approach to fill the missing values.
>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list