[R] data frame manipulation with zero rows

Tue Jun 1 12:07:13 CEST 2010

On 2010-06-01 1:53, arnaud Gaboury wrote:
> Brian,
>
> If I do understand correctly, I must use in my function something else than
> ddply() if I want to avoid any error each time my df has zero rows?
> Am I correct?
>

You could define a function to handle the zero-rows case:

f <- function(x){
  if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
  else
    out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
                     POSITION=sum(QUANTITY))[,c(1,3,2)]
  out
}
f(futures)

  -Peter Ehlers

>
>
>> -----Original Message-----
>> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
>> Sent: Tuesday, June 01, 2010 9:47 AM
>> To: arnaud Gaboury
>> Subject: Re: [R] data frame manipulation with zero rows
>>
>> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
>>
>>> Dear group,
>>>
>>> Here is the kind of data.frame I obtain every day with my function :
>>>
>>> futures<-
>>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
>>> "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
>>> "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
>>> "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
>>> ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
>>> 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
>>>     QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
>>> c("373.2500",
>>>     "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
>>>     "90.7750", "14.9200", "14.9200", "14.9200", "14.9200", "14.9200"
>>>     )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
>>> "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
>>>
>>> I need then to apply to the df this following code line :
>>>
>>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
>> POSITION=
>>> sum(QUANTITY))[,c(1,3,2)]
>>>
>>> It works perfectly in most of case, BUT I have a new problem: it can
>>> sometime occurs that my df "futures" is empty, with zero rows.
>>>
>>>
>>> futures<-
>>> structure(list(DESCRIPTION = character(0), CREATED.DATE =
>>> structure(numeric(0), class = "Date"),
>>>     QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
>>> c("DESCRIPTION",
>>> "CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),
>> class =
>>> "data.frame")
>>>
>>> It is not the usual case, but it can happen. With this df, when I
>> pass the
>>> above mentione line, I get an error :
>>>
>>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
>> POSITION=
>>> sum(QUANTITY))[,c(1,3,2)]
>>> Error in tapply(1:nrow(data), splitv, list) :
>>>   arguments must have same length
>>>
>>>
>>> How can I avoid this when my df is empty?
>>
>> Ask the author of the (missing) function ddply() to correct the error
>> of using 1:nrow(data) by replacing it by seq_len(nrow(data)).
>>
>> It's helpful to give example code, but much more helpful if you test
>> it: yours cannot work without the function ddply() -- this is what
>> 'self-contained' means in the footer here.
>>
>>
>>>
>>> Any help is appreciated
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>