[R] data frame manipulation with zero rows
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 1 12:24:00 CEST 2010
On Tue, 1 Jun 2010, Peter Ehlers wrote:
> On 2010-06-01 1:53, arnaud Gaboury wrote:
>> Brian,
>>
>> If I do understand correctly, I must use in my function something else than
>> ddply() if I want to avoid any error each time my df has zero rows?
>> Am I correct?
>>
>
> You could define a function to handle the zero-rows case:
>
> f <- function(x){
> if(nrow(x) < 1) out <- x[, c(1,3,2)] # or whatever
> else
> out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
> POSITION=sum(QUANTITY))[,c(1,3,2)]
> out
> }
> f(futures)
Or simply fix ddply. We don't know what that is or what it should do
for the case of zero rows: it may or may not be the one in package
plyr.
>
> -Peter Ehlers
>
>>
>>
>>> -----Original Message-----
>>> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
>>> Sent: Tuesday, June 01, 2010 9:47 AM
>>> To: arnaud Gaboury
>>> Subject: Re: [R] data frame manipulation with zero rows
>>>
>>> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
>>>
>>>> Dear group,
>>>>
>>>> Here is the kind of data.frame I obtain every day with my function :
>>>>
>>>> futures<-
>>>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
>>>> "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
>>>> "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
>>>> "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
>>>> ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
>>>> 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
>>>> QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
>>>> c("373.2500",
>>>> "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
>>>> "90.7750", "14.9200", "14.9200", "14.9200", "14.9200", "14.9200"
>>>> )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
>>>> "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
>>>>
>>>> I need then to apply to the df this following code line :
>>>>
>>>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
>>> POSITION=
>>>> sum(QUANTITY))[,c(1,3,2)]
>>>>
>>>> It works perfectly in most of case, BUT I have a new problem: it can
>>>> sometime occurs that my df "futures" is empty, with zero rows.
>>>>
>>>>
>>>> futures<-
>>>> structure(list(DESCRIPTION = character(0), CREATED.DATE =
>>>> structure(numeric(0), class = "Date"),
>>>> QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
>>>> c("DESCRIPTION",
>>>> "CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),
>>> class =
>>>> "data.frame")
>>>>
>>>> It is not the usual case, but it can happen. With this df, when I
>>> pass the
>>>> above mentione line, I get an error :
>>>>
>>>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
>>> POSITION=
>>>> sum(QUANTITY))[,c(1,3,2)]
>>>> Error in tapply(1:nrow(data), splitv, list) :
>>>> arguments must have same length
>>>>
>>>>
>>>> How can I avoid this when my df is empty?
>>>
>>> Ask the author of the (missing) function ddply() to correct the error
>>> of using 1:nrow(data) by replacing it by seq_len(nrow(data)).
>>>
>>> It's helpful to give example code, but much more helpful if you test
>>> it: yours cannot work without the function ddply() -- this is what
>>> 'self-contained' means in the footer here.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list