[R] noob requesting help

David Winsemius dwinsemius at comcast.net
Thu Jun 14 15:22:52 CEST 2012


On Jun 14, 2012, at 3:20 AM, Rui Barradas wrote:

> Hello,
>
> Now the output of str() says 'dat' is a list not a data.frame.  
> That's why R is complaining about dimensions (lack of, in this case).
>
> Try
>
> dat2 <- data.frame(do.call(cbind, dat), stringsAsFactors=FALSE)

The construction data.frame(cbind(.)) should be severely deprecated.  
It coerces all the columns to be of the same  class and removes all  
the attributes except names. This is what happens to a POSIXlt "vector":

data.frame(do.call(cbind,  list(a=1:10, b=as.POSIXlt(ISOdate(2001,  
1:10, 1))) ),stringsAsFactors=FALSE )
     a                                                b
1   1                     0, 0, 0, 0, 0, 0, 0, 0, 0, 0
2   2                     0, 0, 0, 0, 0, 0, 0, 0, 0, 0
3   3           12, 12, 12, 12, 12, 12, 12, 12, 12, 12
4   4                     1, 1, 1, 1, 1, 1, 1, 1, 1, 1
5   5                     0, 1, 2, 3, 4, 5, 6, 7, 8, 9
6   6 101, 101, 101, 101, 101, 101, 101, 101, 101, 101
7   7                     1, 4, 4, 0, 2, 5, 0, 3, 6, 1
8   8      0, 31, 59, 90, 120, 151, 181, 212, 243, 273
9   9                     0, 0, 0, 0, 0, 0, 0, 0, 0, 0
10 10                     0, 0, 0, 0, 0, 0, 0, 0, 0, 0


Use instead:

dat2 <- data.frame( dat, stringsAsFactors=FALSE)

The data.frame function will do the cbinding actions but will preserve  
column attributes such as POSIXlt. The process may convert to POSIXct  
from POSIXlt.

 > structure(data.frame(list(a=1:10, b=as.POSIXlt(ISOdate(2001, 1:10,  
1)))) )
     a                   b
1   1 2001-01-01 12:00:00
2   2 2001-02-01 12:00:00
3   3 2001-03-01 12:00:00
4   4 2001-04-01 12:00:00
5   5 2001-05-01 12:00:00
6   6 2001-06-01 12:00:00
7   7 2001-07-01 12:00:00
8   8 2001-08-01 12:00:00
9   9 2001-09-01 12:00:00
10 10 2001-10-01 12:00:00
 > str(data.frame(list(a=1:10, b=as.POSIXlt(ISOdate(2001, 1:10, 1)))) )
'data.frame':	10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: POSIXct, format: "2001-01-01 12:00:00" "2001-02-01 12:00:00"  
"2001-03-01 12:00:00" ...

-- 
David.

>
> Then run the lapply()
>
> Also, if dput(head(dat, 20)) is very big, ommit the argument 20.  
> See ?head for its meaning.
> But if the above works there's no need for it.
>
> Rui Barradas
>
> Em 14-06-2012 01:06, capital_P escreveu:
>> Rui Barradas wrote
>>> Sorry, but the output of dput() starts with 'structure', not like  
>>> what
>>> you've posted.
>>> And there are much more than 20 dates in the beginning.
>>>
>>> The posting guide is easy to find
>>>
>> apologies for my earlier ignorance.
>>
>> It seems the output of dput(head(data, 20) is too big to print in  
>> full. When
>> I press stop immediately after pressing enter, I get:
>>
>> dput(head(long, 30))
>> structure(list(tripID = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L), device_info_serial = c(121L, 121L, 121L,
>> 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L,
>> 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L,
>> 121L, 121L, 121L, 121L, 121L), mdate = structure(c(33L, 33L,
>> 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L,
>>
>> If I let it run just a little bit longer, it gives the enormous  
>> output like
>> before.
>>
>> I made a new vector containing only the columns I need for this  
>> problem:
>>
>>> str(dat)
>> List of 4
>>  $ device_info_serial: int [1:34773] 121 121 121 121 121 121 121  
>> 121 121 121
>> ...
>>  $ hour              : int [1:34773] 10 11 11 11 11 11 11 11 11  
>> 11 ...
>>  $ time              : POSIXlt[1:34773], format: "2009-05-21  
>> 10:59:24" ...
>>  $ tripID            : int [1:34773] 3 3 3 3 3 3 3 3 3 3 ...
>>
>> You were right about the factors. I think I solved it using:
>>
>> dat$time<- strptime(long$date_time.x, format = "%Y-%m-%d %H:%M:%S")
>>
>> this seems to help, as R doesn't start computing for hours. It does  
>> give an
>> error though:
>>
>>> departures<- lapply(split(dat, list(dat$device_info_serial, dat 
>>> $tripID)),
>>> function(x) x[x$time == min(x$time),])
>> Error in x[x$time == min(x$time), ] : incorrect number of dimensions
>> In addition: Warning messages:
>> 1: In split.default(dat, list(dat$device_info_serial, dat$tripID)) :
>>   data length is not a multiple of split variable
>> 2: In min(x$time) : no non-missing arguments to min; returning Inf
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/noob-requesting-help-tp4632803p4633317.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list