[R] How to replace a column in a data frame with another one with a different size

Sun Jul 8 19:52:52 CEST 2012

Your

On Sun, Jul 8, 2012 at 12:22 PM, Stathis Kamperis <ekamperi at gmail.com> wrote:
> 2012/7/8 Michael Weylandt <michael.weylandt at gmail.com>:
>>
>>
>> On Jul 8, 2012, at 9:31 AM, Stathis Kamperis <ekamperi at gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> I have a dataframe with 1 column and I'd like to replace that column
>>> with a moving average.
>>> Example:
>>>
>>>> library('zoo')
>>>> mydat <- seq_len(10)
>>>> mydat
>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>> df <- data.frame("V1" = mydat)
>>>> df
>>>   V1
>>> 1   1
>>> 2   2
>>> 3   3
>>> 4   4
>>> 5   5
>>> 6   6
>>> 7   7
>>> 8   8
>>> 9   9
>>> 10 10
>>>> df[df$V1 <- rollapply(df$V1, 3, mean)]
>>> Error in `$<-.data.frame`(`*tmp*`, "V1", value = c(2, 3, 4, 5, 6, 7, 8,  :
>>>  replacement has 8 rows, data has 10
>>>>
>>>
>>
>> I'm not sure you need the outer df[...] -- I think you just want
>>
>> df$V1 <- rollapply(df$V1,3,mean)
>>
>> However, this will still give you the error message you're seeing because rollapply() only returns 8 values here (you don't get the "endpoints" by default). To get the right number of rows, you want
>>
>> rollapply(df$V1, 3, mean, fill = NA) # Change NA if desired
>>
>> which will put NA's on each end and give you a length 10 result, as needed.
>>
>
> Thanks Michael (and arun@)!
>
> If I would do that, then (in my particular case), I'd need to
> eliminate NA's, with something like:
> df$V1 <- df$V1[!is.na(df$V1)]
>
> which would still fail with the same error message :-P

You're getting tripped up (again) by trying to sub-assign something
that's too small.

df is a rectangular array of data: on the RHS of that expression, you
are selecting out a subset of it of say 8 rows and telling R to
replace the 10-row V1 column with those 8 elements. This cannot be
done with the fixed rectangular structure and hence the error message.

What you want to do is something like this:

df[!is.na(df$V1), ]

Let's walk through that

df$V1 -- take the V1 column of df

is.na() -- get a logical vector saying where NAs are

!is.na() -- identify the rows where there _aren't_ NAs

df[ !is.na(), ] -- (the important one) take the rows of df (all
columns) where there aren't NAs

What you might be wanting to do is

df <- df[!is.na(df$V1), ]

This is much better than what you are trying to do (working on the
whole array at a time and trusting R to keep it all together than
trying to manipulate slices individually)

But even more idiomatic would be

complete.cases(df)

Take a look at some introductory material and try to wrap your head
around indexing rows and columns together again: it's a fantastic
paradigm and will be of much more use to you long run than trying to
work on individual columns for subsetting/data-cleaning.

Best,
Michael

>
> Regards,
> Stathis
>
>> Best,
>> Michael
>>
>>> I could use a temporary variable to store the results of rollapply()
>>> and then reconstruct the data frame, but I was wondering if there is a
>>> one-liner that can achieve the same thing.
>>>
>>> Best regards,
>>> Stathis
>>>
>>> P.S. If you don't mind, cc me at your reply because I'm not subscribed
>>> to the list (but I will check the archive anyway).
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.