[R] [Rd] Selecting multiple columns with same name

Fri Oct 8 19:40:30 CEST 2010

This is better suited for R-help than R-devel, so I'm copying to the R-help list:

> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Martin Kerr
> Sent: October-08-10 3:09 AM
> To: r-devel at r-project.org
> Subject: [Rd] Selecting multiple columns with same name
> 
> 
> Hello all,
> I've been working on a project involving clustering algorithms and I've hit a bit of a snag.
> I have my main data frame with is 31 X 1000, I have fed this into dif and hclust in order to produce a
> 31 item vector stating the perceived grouping of the columns.E.g.
> 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 3 3 3 3
> etc.
> What I want to do is use this information to separate each groups worth of data into a separate frame
> so I can perform additional calculations on them.I've been attempting to use subset by setting the
> colnames to the grouping results thus:
> colnames(dataFrame) <- groupssubset(dataFrame,select=c(colname="1")

Here's one way to do it

> fdf <- as.data.frame(matrix(rnorm(100), ncol = 10))
> fdf
            V1         V2         V3           V4          V5          V6          V7         V8
1   0.35264797 -0.4280407  0.4706150 -0.772936086  0.59984719  0.97885696  0.13569457  0.5005072
2  -0.09800830 -0.3946618 -0.6816040 -0.173057585 -0.95377116  1.32702531  0.51894946  1.8779715
3   0.00585569  0.5240508  0.6334294  0.775787713  1.13537433 -0.75363920  0.09240357  1.7652420
4  -1.28667042 -0.3808195 -1.3735447  0.601288920  0.37448709  1.20875897  1.26392905  0.3573046
5   1.05127892 -0.1717773  0.4795011  0.408584918 -1.57947076 -1.76699298 -2.15778156 -0.6202422
6   0.49935805 -0.5858645  0.1466443  1.094320479 -0.01534562  0.03349714 -0.86508986  0.3335337
7   0.64649298 -0.8044967  1.7273739  0.005654138  0.88092416 -0.43467177  0.33123616 -1.0062133
8   0.67393707 -0.8927181  1.9050954  0.824576116 -1.49872072  0.13610000 -0.98904113 -1.1763053
9  -0.06217531 -0.6020426 -0.5198348  0.475774170  0.72492806 -1.93507347 -0.26827918 -0.7902781
10 -4.05961249 -1.1839906 -2.1285662  0.992767748 -1.45187700 -0.32688422  0.92335149  0.2405690
            V9        V10
1  -1.10422899  0.7343708
2  -0.21511926 -0.3472193
3  -1.56249900  0.6228027
4  -1.64679524  0.9548577
5   0.31530976  0.7420800
6   0.02644282 -1.0393438
7  -0.70669500 -0.8335578
8  -0.29898269  1.8679939
9  -0.08449491 -0.7413130
10  0.66960457 -0.4666664
> colGroups <- c(1,1,1,2,1,3,3,2,3,1)
> fdf[, colGroups == 1]
            V1         V2         V3          V5        V10
1   0.35264797 -0.4280407  0.4706150  0.59984719  0.7343708
2  -0.09800830 -0.3946618 -0.6816040 -0.95377116 -0.3472193
3   0.00585569  0.5240508  0.6334294  1.13537433  0.6228027
4  -1.28667042 -0.3808195 -1.3735447  0.37448709  0.9548577
5   1.05127892 -0.1717773  0.4795011 -1.57947076  0.7420800
6   0.49935805 -0.5858645  0.1466443 -0.01534562 -1.0393438
7   0.64649298 -0.8044967  1.7273739  0.88092416 -0.8335578
8   0.67393707 -0.8927181  1.9050954 -1.49872072  1.8679939
9  -0.06217531 -0.6020426 -0.5198348  0.72492806 -0.7413130
10 -4.05961249 -1.1839906 -2.1285662 -1.45187700 -0.4666664
> fdf[, colGroups == 2]
             V4         V8
1  -0.772936086  0.5005072
2  -0.173057585  1.8779715
3   0.775787713  1.7652420
4   0.601288920  0.3573046
5   0.408584918 -0.6202422
6   1.094320479  0.3335337
7   0.005654138 -1.0062133
8   0.824576116 -1.1763053
9   0.475774170 -0.7902781
10  0.992767748  0.2405690
> fdf[, colGroups == 3]
            V6          V7          V9
1   0.97885696  0.13569457 -1.10422899
2   1.32702531  0.51894946 -0.21511926
3  -0.75363920  0.09240357 -1.56249900
4   1.20875897  1.26392905 -1.64679524
5  -1.76699298 -2.15778156  0.31530976
6   0.03349714 -0.86508986  0.02644282
7  -0.43467177  0.33123616 -0.70669500
8   0.13610000 -0.98904113 -0.29898269
9  -1.93507347 -0.26827918 -0.08449491
10 -0.32688422  0.92335149  0.66960457
> 

and this can be automated as a loop or with lapply() and the like.

HTH

Steve McKinney

> This however only returns the first column rather than all instances of a column with that name. Note
> that these columns may not necessarily be contiguous.
> Is this the correct way to go about this?
> Thank You
> Martin Kerr
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel