[R] Multiple assignment to several columns in dataset

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Apr 28 19:21:33 CEST 2013


See ?strptime on how to handle time formats.


If you want to stay playing with strsplit: It actually returns a list, 
hence you probably want to:

  dt$hr  <- sapply(tstamp, "[", 1)

Uwe Ligges




On 28.04.2013 13:48, Alexandre Karev wrote:
> Hello!
>
> I've time stamp ('time') field in dataset ('dt') with values like "18:10",
> "19:43", ....
> I need to split time field into hour and minutes and add both as new
> columns to dataset.
> We are able to do it in bash+awk, but curious to stay within R codebase as
> much as possible.
>
> For now we are using such solution:
>
>   tstamp <- strsplit(dt$time, ":")
>
> # constructing hours field
>   dt$hr  <- lapply(tstamp, function(v) {v[1] } )
>
> # constructing minutes field
>   dt$m   <- lapply(tstamp, function(v) {v[2] } )
>
> It works find on sample (and simple, small) data set.
>
> But while working on real data with several millions of records, it seems
> not very practical to make two separate passes on tstamp list.
>
> We've tried to use instead such construction:
>
> dt[c('hr', 'm')] <- strsplit(dt$time, ":")
>
> But the R environment 'consumes' whole system 'memory' - 8Gb, and starts to
> swapping while proceeding this statement and 'hangs' for such long time
> that we have never had patience to wait for results.
>
> Is it any simple and efficient way to assign several dataset columns with
> values computed/prepared on base of set of other columns?
>
>
> R-egards,
> Alex
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list