[R] sapply to bind columns, with repeat?

Katrina Bennett kebennett at alaska.edu
Fri Aug 12 23:08:02 CEST 2011


Hi Weidong Gu,

This works! For my clarity, and so I can repeat this process if need be:

The 'mat' generates a matrix using whatever is supplied to x (i.e.
coop.dat) using the columns from position 9:length(x) of 6 columns (by
row).

The 'rem.col' generates a matrix of the first 1:8 columns of 8 columns.

The 'return' statement calls the function to cbind together rem.col and mat.

Then 'apply' this all to coop.dat, by rows, using function reorg.

Is this correct?

Thank you very much,

Katrina


On Fri, Aug 12, 2011 at 10:28 AM, Weidong Gu <anopheles123 at gmail.com> wrote:
> Katrina,
>
> try this.
>
> reorg<-function(x){
> mat<-matrix(x[9:length(x)],ncol=6,byrow=T)
> rem.col<-matrix(rep(x[1:8],nrow(mat)),byrow=T,ncol=8)
> return(data.frame(cbind(rem.col,mat)))
> }
>
> co<-do.call('rbind',apply(coop.dat,1,function(x) reorg(x)))
>
> You may need to tweak a bit to fit exactly what you want.
>
> Weidong Gu
>
> On Fri, Aug 12, 2011 at 2:35 AM, Katrina Bennett <kebennett at alaska.edu> wrote:
>> Hi R-help,
>>
>> I am working with US COOP network station data and the files are
>> concatenated in single rows for all years, but I need to pull these
>> apart into rows for each day. To do this, I need to extract part of
>> each row such as station id, year, mo, and repeat this against other
>> variables in the row (days). My problem is that there are repeated
>> values for each day, and the files are fixed width field without
>> order.
>>
>> Here is an example of just one line of data.
>>
>> coop.raw <- c("DLY09752806TMAX F20100199990620107 00049 20107 00062
>> B0207 00041 20207 00049 B0307 00040 20307 00041 B0407 00042 20407
>> 00040 B0507 00041 20507 00042 B0607 00043 20607 00041 B0707 00055
>> 20707 00043 B0807 00039 20807 00055 B0907 00037 20907 00039 B1007
>> 00038 21007 00037 B1107 00048 21107 00038 B1207 00050 21207 00048
>> B1307 00051 21307 00050 B1407 00058 21407 00051 B1507 00068 21507
>> 00058 B1607 00065 21607 00068 B1707 00068 21707 00065 B1807 00067
>> 21807 00068 B1907 00068 21907 00067 B2007 00069 22007 00068 B2107
>> 00057 22107 00069 B2207 00048 22207 00057 B2307 00051 22307 00048
>> B2407 00073 22407 00051 B2507 00062 22507 00073 B2607 00056 22607
>> 00062 B2707 00053 22707 00056 B2807 00064 22807 00053 B2907 00057
>> 22907 00064 B3007 00047 23007 00057 B3107 00046 23107 00047 B")
>> write.csv(coop.raw, "coop.tmp", row.names=F, quote=F)
>> coop.dat <- read.fwf("coop.tmp", widths =
>> c(c(3,8,4,2,4,2,4,3),rep(c(2,2,1,5,1,1),62)), na.strings=c("9999"),
>> skip=1, as.is=T)
>> rep.name <- rep(c("day","hr","met","dat","fl1","fl2"), 62)
>> rep.count <- rep(c(1:62), each=6, 1)
>> names(coop.dat) <- c("rect", "id", "elem", "unt", "year", "mo",
>> "fill", "numval", paste(rep.name, rep.count, sep="_"))
>>
>> I would like to generate output that contains in one row, the columns
>> "id", "elem", "unt", "year", "mo", and "numval". Binded to these
>> initial columns, I would like only "day_1", "hr_1", "met_1", "dat_1",
>> "fl1_1", and "fl2_1". Then, in the next row I would like repeated the
>> initial columns "id", "elem", "unt", "year", "mo", and "numval" and
>> then binded "day_2", "hr_2", "met_2", "dat_2", "fl1_2", and "f2_2" and
>> so on until all the data for all rows has been allocated. Then, move
>> onto the next row and repeat.
>>
>> I think I should be able to do this with some sort of sapply or lapply
>> function, but I'm struggling with the format for repeating the initial
>> columns, and then skipping through the next columns.
>>
>> Thank you,
>>
>> Katrina
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list