[R] about interpolating data in r

Fri Jul 22 17:39:46 CEST 2016

approx() has a 'rule' argument that controls how it deals with
extrapolation.  Run help(approx) and read about the details.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jul 22, 2016 at 8:29 AM, lily li <chocold12 at gmail.com> wrote:

> Thanks, Ismail.
> For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to
> fill in the missing values for column C. There is no relationship between
> column A, B, and C.
> For the missing values between 2009-01-05 and 2009-11-20, if there are any,
> I found this approach is very helpful.
> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>
>
>
> On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <sezenismail at gmail.com>
> wrote:
>
> >
> > > On 22 Jul 2016, at 01:34, lily li <chocold12 at gmail.com> wrote:
> > >
> > > I have a question about interpolating missing values in a dataframe.
> >
> > First of all, filling missing values action must be taken into account
> > very carefully. It must be known the nature of the data that wanted to be
> > filled and most of the time, to let them be NA is the most appropriate
> > action.
> >
> > > The
> > > dataframe is in the following, Column C has no data before 2009-01-05
> and
> > > after 2009-12-31, how to interpolate data for the blanks?
> >
> > Why a dataframe? Is there any relationship between columns A,B and C? If
> > there is, then you might want to consider filling missing values by a
> > linear model approach instead of interpolation. You said that there is
> not
> > data before 2009-01-05 and after 2009-12-31 but according to dataframe,
> > there is not data after 2009-11-20?
> >
> > > That is to say,
> > > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
> >
> > Also you metion interpolating blanks but you want interpolation between
> > two gaps? Do you want to fill missing values before 2009-01-05 and after
> > 2009-11-20 or do you want to find intermediate values between 2009-01-05
> > and 2009-11-20? This is a bit unclear.
> >
> > >
> > >
> > > df
> > > time                A      B     C
> > > 2009-01-01    3      4.5
> > > 2009-01-02    4      5
> > > 2009-01-03    3.3   6
> > > 2009-01-04    4.1   7
> > > 2009-01-05    4.4   6.2   5.4
> > > ...
> > >
> > > 2009-11-20    5.1   5.5   6.1
> > > 2009-11-21    5.4   4
> > > ...
> > > 2009-12-31    4.5   6
> >
> >
> > If you want to fill missing values at the end-points for column C (before
> > 2009-01-05 and after 2009-11-20), and all data you have is between
> > 2009-01-05 and 2009-11-20, this means that you want extrapolation
> (guessing
> > unkonwn values that is out of known values). So, you can use only values
> at
> > column C to guess missing end-point values. You can use splinefun (or
> > spline) functions for this purpose. But let me note that this kind of
> > approach might help you only for a few missing values close to
> end-points.
> > Otherwise, you might find yourself in a huge mistake.
> >
> > As I mentioned in my first sentence, If you have a relationship between
> > all columns or you have data for column C for other years (for instance,
> > assume that you have data for column C for 2007, 2008, and 2010 but not
> > 2009) you may want to try a statistical approach to fill the missing
> values.
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]