[R] Condensing data.frame

Morway, Eric emorway at usgs.gov
Sun Dec 7 15:21:48 CET 2014


Using the dataset "dat" (found below), I'm seeking a way to condense down
the data.frame such that each "site" (i.e., "CID_1"..."CID_13") has a
maximum of 7 rows of post-processed data, where the first 6 have the
highest "countPercentage" and the 7th row is the sum of "countPercentage"
from all other rows within that "site", and it is assigned the name
"Other".  So, for the first two sites in the provided data.frame, CID_1 &
CID_10, they would reduce to:

CID_1 Cyanobacteria     37.48
CID_1 Proteobacteria     29.24
CID_1 Bacteroidetes      15.83
CID_1 Verrucomicrobia  6.38
CID_1 Firmicutes          2.30
CID_1 Acidobacteria      2.08
CID_1 Other                6.68
CID_10 Proteobacteria     35.37
CID_10 Bacteroidetes      25.19
CID_10 Cyanobacteria     23.29
CID_10 Verrucomicrobia  6.97
CID_10 Acidobacteria      1.99
CID_10 Actinobacteria     1.64
CID_10 Other                 5.55


dat <- read.table(header=TRUE, sep=",",
text="site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
CID_1,Bacteroidetes,19546,123509,15.825567368
CID_1,Verrucomicrobia,7886,123509,6.384959801
CID_1,Firmicutes,2843,123509,2.301856545
CID_1,Acidobacteria,2563,123509,2.075152418
CID_1,Actinobacteria,2069,123509,1.675181566
CID_1,Planctomycetes,1481,123509,1.199102899
CID_1,Chloroflexi,1181,123509,0.956205621
CID_1,Gemmatimonadetes,956,123509,0.774032662
CID_1,Spirochaetes,688,123509,0.557044426
CID_1,Lentisphaerae,526,123509,0.425879895
CID_1,Ignavibacteriae,324,123509,0.262329061
CID_1,Chlorobi,238,123509,0.192698508
CID_1,Nitrospirae,230,123509,0.186221247
CID_1,Nitrospinae,169,123509,0.136832134
CID_1,Elusimicrobia,131,123509,0.106065145
CID_1,Tenericutes,114,123509,0.092300966
CID_1,Fibrobacteres,72,123509,0.058295347
CID_1,Thermotogae,21,123509,0.017002810
CID_1,Fusobacteria,21,123509,0.017002810
CID_1,Armatimonadetes,15,123509,0.012144864
CID_1,Synergistetes,10,123509,0.008096576
CID_1,Deinococcus-Thermus,6,123509,0.004857946
CID_1,Deferribacteres,2,123509,0.001619315
CID_1,Caldiserica,2,123509,0.001619315
CID_10,Proteobacteria,16043,45362,35.366606411
CID_10,Bacteroidetes,11426,45362,25.188483753
CID_10,Cyanobacteria,10567,45362,23.294828270
CID_10,Verrucomicrobia,3162,45362,6.970592126
CID_10,Acidobacteria,902,45362,1.988448481
CID_10,Actinobacteria,746,45362,1.644548300
CID_10,Firmicutes,718,45362,1.582822627
CID_10,Gemmatimonadetes,358,45362,0.789206825
CID_10,Planctomycetes,357,45362,0.787002337
CID_10,Chloroflexi,265,45362,0.584189410
CID_10,Spirochaetes,235,45362,0.518054759
CID_10,Ignavibacteriae,177,45362,0.390194436
CID_10,Lentisphaerae,108,45362,0.238084741
CID_10,Nitrospinae,75,45362,0.165336625
CID_10,Nitrospirae,58,45362,0.127860324
CID_10,Chlorobi,44,45362,0.096997487
CID_10,Elusimicrobia,28,45362,0.061725673
CID_10,Fibrobacteres,26,45362,0.057316697
CID_10,Armatimonadetes,15,45362,0.033067325
CID_10,Deinococcus-Thermus,13,45362,0.028658348
CID_10,Tenericutes,10,45362,0.022044883
CID_10,Synergistetes,9,45362,0.019840395
CID_10,Fusobacteria,9,45362,0.019840395
CID_10,Deferribacteres,6,45362,0.013226930
CID_10,Thermotogae,3,45362,0.006613465
CID_10,Caldiserica,2,45362,0.004408977
CID_11,Proteobacteria,10019,31387,31.920858954
CID_11,Cyanobacteria,8811,31387,28.072131774
CID_11,Bacteroidetes,7930,31387,25.265237200
CID_11,Verrucomicrobia,1750,31387,5.575556759
CID_11,Firmicutes,806,31387,2.567942142
CID_11,Acidobacteria,548,31387,1.745945774
CID_11,Actinobacteria,434,31387,1.382738076
CID_11,Chloroflexi,203,31387,0.646764584
CID_11,Planctomycetes,197,31387,0.627648389
CID_11,Gemmatimonadetes,192,31387,0.611718227
CID_11,Ignavibacteriae,87,31387,0.277184822
CID_11,Spirochaetes,80,31387,0.254882595
CID_11,Tenericutes,71,31387,0.226208303
CID_11,Fusobacteria,67,31387,0.213464173
CID_11,Lentisphaerae,54,31387,0.172045751
CID_11,Chlorobi,40,31387,0.127441297
CID_11,Nitrospinae,33,31387,0.105139070
CID_11,Armatimonadetes,22,31387,0.070092714
CID_11,Fibrobacteres,15,31387,0.047790487
CID_11,Nitrospirae,13,31387,0.041418422
CID_11,Elusimicrobia,13,31387,0.041418422
CID_11,Deinococcus-Thermus,2,31387,0.006372065
CID_12,Cyanobacteria,241,644,37.422360248
CID_12,Bacteroidetes,210,644,32.608695652
CID_12,Proteobacteria,118,644,18.322981366
CID_12,Verrucomicrobia,38,644,5.900621118
CID_12,Acidobacteria,11,644,1.708074534
CID_12,Ignavibacteriae,6,644,0.931677019
CID_12,Lentisphaerae,5,644,0.776397516
CID_12,Firmicutes,5,644,0.776397516
CID_12,Planctomycetes,3,644,0.465838509
CID_12,Fusobacteria,3,644,0.465838509
CID_12,Tenericutes,2,644,0.310559006
CID_12,Actinobacteria,2,644,0.310559006
CID_13,Cyanobacteria,8581,25530,33.611437524
CID_13,Bacteroidetes,6878,25530,26.940853897
CID_13,Proteobacteria,5341,25530,20.920485703
CID_13,Verrucomicrobia,1244,25530,4.872698786
CID_13,Firmicutes,1148,25530,4.496670584
CID_13,Acidobacteria,548,25530,2.146494320
CID_13,Spirochaetes,477,25530,1.868390129
CID_13,Ignavibacteriae,298,25530,1.167254211
CID_13,Actinobacteria,227,25530,0.889150020
CID_13,Planctomycetes,184,25530,0.720720721
CID_13,Chloroflexi,181,25530,0.708969839
CID_13,Gemmatimonadetes,121,25530,0.473952213
CID_13,Lentisphaerae,93,25530,0.364277321
CID_13,Tenericutes,61,25530,0.238934587
CID_13,Fibrobacteres,47,25530,0.184097141
CID_13,Nitrospinae,28,25530,0.109674892
CID_13,Nitrospirae,26,25530,0.101840971
CID_13,Chlorobi,18,25530,0.070505288
CID_13,Elusimicrobia,13,25530,0.050920486
CID_13,Synergistetes,8,25530,0.031335684
CID_13,Fusobacteria,4,25530,0.015667842
CID_13,Deinococcus-Thermus,2,25530,0.007833921
CID_13,Thermotogae,2,25530,0.007833921
CID_2,Cyanobacteria,43812,94826,46.202518297
CID_2,Proteobacteria,22180,94826,23.390209436
CID_2,Bacteroidetes,16993,94826,17.920190665
CID_2,Verrucomicrobia,4779,94826,5.039757029
CID_2,Acidobacteria,1728,94826,1.822285027
CID_2,Firmicutes,1385,94826,1.460569886
CID_2,Planctomycetes,815,94826,0.859468922
CID_2,Actinobacteria,677,94826,0.713939215
CID_2,Gemmatimonadetes,625,94826,0.659101934
CID_2,Chloroflexi,416,94826,0.438698247
CID_2,Spirochaetes,415,94826,0.437643684
CID_2,Lentisphaerae,221,94826,0.233058444
CID_2,Ignavibacteriae,180,94826,0.189821357
CID_2,Fibrobacteres,155,94826,0.163457280
CID_2,Chlorobi,112,94826,0.118111067
CID_2,Elusimicrobia,111,94826,0.117056503
CID_2,Tenericutes,75,94826,0.079092232
CID_2,Nitrospinae,40,94826,0.042182524
CID_2,Nitrospirae,31,94826,0.032691456
CID_2,Deinococcus-Thermus,17,94826,0.017927573
CID_2,Armatimonadetes,17,94826,0.017927573
CID_2,Synergistetes,16,94826,0.016873010
CID_2,Fusobacteria,15,94826,0.015818446
CID_2,Deferribacteres,7,94826,0.007381942
CID_2,Caldiserica,2,94826,0.002109126
CID_2,Thermotogae,2,94826,0.002109126
CID_3,Cyanobacteria,18888,46181,40.899937204
CID_3,Proteobacteria,12532,46181,27.136701241
CID_3,Bacteroidetes,9070,46181,19.640111734
CID_3,Verrucomicrobia,2291,46181,4.960914662
CID_3,Acidobacteria,689,46181,1.491955566
CID_3,Firmicutes,631,46181,1.366362790
CID_3,Actinobacteria,470,46181,1.017734566
CID_3,Spirochaetes,366,46181,0.792533726
CID_3,Planctomycetes,326,46181,0.705918018
CID_3,Chloroflexi,282,46181,0.610640740
CID_3,Gemmatimonadetes,194,46181,0.420086183
CID_3,Fibrobacteres,116,46181,0.251185552
CID_3,Ignavibacteriae,109,46181,0.236027804
CID_3,Nitrospinae,46,46181,0.099608064
CID_3,Nitrospirae,44,46181,0.095277279
CID_3,Tenericutes,40,46181,0.086615708
CID_3,Lentisphaerae,38,46181,0.082284922
CID_3,Chlorobi,16,46181,0.034646283
CID_3,Elusimicrobia,14,46181,0.030315498
CID_3,Fusobacteria,10,46181,0.021653927
CID_3,Armatimonadetes,7,46181,0.015157749
CID_3,Synergistetes,2,46181,0.004330785
CID_4,Proteobacteria,433,1005,43.084577114
CID_4,Bacteroidetes,301,1005,29.950248756
CID_4,Actinobacteria,111,1005,11.044776119
CID_4,Cyanobacteria,44,1005,4.378109453
CID_4,Acidobacteria,28,1005,2.786069652
CID_4,Chloroflexi,24,1005,2.388059701
CID_4,Nitrospirae,21,1005,2.089552239
CID_4,Verrucomicrobia,12,1005,1.194029851
CID_4,Gemmatimonadetes,12,1005,1.194029851
CID_4,Firmicutes,7,1005,0.696517413
CID_4,Spirochaetes,5,1005,0.497512438
CID_4,Ignavibacteriae,5,1005,0.497512438
CID_4,Elusimicrobia,2,1005,0.199004975
CID_5,Proteobacteria,5002,11914,41.984220245
CID_5,Bacteroidetes,1512,11914,12.690951821
CID_5,Verrucomicrobia,1361,11914,11.423535337
CID_5,Acidobacteria,1207,11914,10.130938392
CID_5,Cyanobacteria,721,11914,6.051703878
CID_5,Planctomycetes,635,11914,5.329864026
CID_5,Actinobacteria,398,11914,3.340607688
CID_5,Nitrospirae,314,11914,2.635554809
CID_5,Chloroflexi,313,11914,2.627161323
CID_5,Firmicutes,195,11914,1.636729898
CID_5,Gemmatimonadetes,129,11914,1.082759778
CID_5,Chlorobi,31,11914,0.260198086
CID_5,Armatimonadetes,22,11914,0.184656706
CID_5,Ignavibacteriae,21,11914,0.176263220
CID_5,Fusobacteria,14,11914,0.117508813
CID_5,Deinococcus-Thermus,10,11914,0.083934867
CID_5,Lentisphaerae,9,11914,0.075541380
CID_5,Elusimicrobia,7,11914,0.058754407
CID_5,Nitrospinae,7,11914,0.058754407
CID_5,Synergistetes,4,11914,0.033573947
CID_5,Spirochaetes,2,11914,0.016786973
CID_6,Cyanobacteria,6462,17852,36.197624916
CID_6,Proteobacteria,5036,17852,28.209724401
CID_6,Bacteroidetes,3906,17852,21.879901412
CID_6,Verrucomicrobia,1016,17852,5.691239077
CID_6,Acidobacteria,317,17852,1.775711405
CID_6,Actinobacteria,286,17852,1.602061394
CID_6,Firmicutes,234,17852,1.310777504
CID_6,Planctomycetes,134,17852,0.750616177
CID_6,Gemmatimonadetes,112,17852,0.627380686
CID_6,Spirochaetes,97,17852,0.543356487
CID_6,Chloroflexi,77,17852,0.431324221
CID_6,Lentisphaerae,56,17852,0.313690343
CID_6,Ignavibacteriae,35,17852,0.196056464
CID_6,Nitrospirae,23,17852,0.128837105
CID_6,Nitrospinae,19,17852,0.106430652
CID_6,Tenericutes,12,17852,0.067219359
CID_6,Chlorobi,8,17852,0.044812906
CID_6,Armatimonadetes,7,17852,0.039211293
CID_6,Fibrobacteres,7,17852,0.039211293
CID_6,Elusimicrobia,4,17852,0.022406453
CID_6,Fusobacteria,2,17852,0.011203227
CID_6,Deferribacteres,2,17852,0.011203227
CID_7,Cyanobacteria,11046,30425,36.305669680
CID_7,Proteobacteria,8418,30425,27.668036154
CID_7,Bacteroidetes,6197,30425,20.368118324
CID_7,Verrucomicrobia,1745,30425,5.735414955
CID_7,Firmicutes,732,30425,2.405916187
CID_7,Acidobacteria,582,30425,1.912900575
CID_7,Actinobacteria,365,30425,1.199671323
CID_7,Fusobacteria,344,30425,1.130649137
CID_7,Planctomycetes,253,30425,0.831552999
CID_7,Chloroflexi,221,30425,0.726376335
CID_7,Gemmatimonadetes,131,30425,0.430566968
CID_7,Spirochaetes,127,30425,0.417419885
CID_7,Lentisphaerae,88,30425,0.289235826
CID_7,Ignavibacteriae,69,30425,0.226787182
CID_7,Nitrospinae,37,30425,0.121610518
CID_7,Nitrospirae,21,30425,0.069022186
CID_7,Chlorobi,17,30425,0.055875103
CID_7,Elusimicrobia,15,30425,0.049301561
CID_7,Fibrobacteres,9,30425,0.029580937
CID_7,Armatimonadetes,4,30425,0.013147083
CID_7,Deferribacteres,4,30425,0.013147083
CID_8,Cyanobacteria,14446,43589,33.141388883
CID_8,Proteobacteria,13270,43589,30.443460506
CID_8,Bacteroidetes,8834,43589,20.266581018
CID_8,Verrucomicrobia,2529,43589,5.801922503
CID_8,Firmicutes,1176,43589,2.697928376
CID_8,Acidobacteria,780,43589,1.789442290
CID_8,Actinobacteria,542,43589,1.243432976
CID_8,Spirochaetes,406,43589,0.931427654
CID_8,Planctomycetes,295,43589,0.676776251
CID_8,Chloroflexi,277,43589,0.635481429
CID_8,Ignavibacteriae,243,43589,0.557480098
CID_8,Lentisphaerae,230,43589,0.527656060
CID_8,Gemmatimonadetes,162,43589,0.371653399
CID_8,Fusobacteria,106,43589,0.243180619
CID_8,Tenericutes,57,43589,0.130766937
CID_8,Nitrospirae,51,43589,0.117001996
CID_8,Chlorobi,50,43589,0.114707839
CID_8,Nitrospinae,36,43589,0.082589644
CID_8,Fibrobacteres,34,43589,0.078001331
CID_8,Elusimicrobia,29,43589,0.066530547
CID_8,Armatimonadetes,19,43589,0.043588979
CID_8,Aquificae,8,43589,0.018353254
CID_8,Deferribacteres,7,43589,0.016059097
CID_8,Dictyoglomi,2,43589,0.004588314
CID_9,Proteobacteria,26463,77120,34.314056017
CID_9,Cyanobacteria,20329,77120,26.360217842
CID_9,Bacteroidetes,15956,77120,20.689834025
CID_9,Verrucomicrobia,3323,77120,4.308869295
CID_9,Firmicutes,2726,77120,3.534751037
CID_9,Spirochaetes,1644,77120,2.131742739
CID_9,Acidobacteria,1634,77120,2.118775934
CID_9,Actinobacteria,1200,77120,1.556016598
CID_9,Chloroflexi,1128,77120,1.462655602
CID_9,Planctomycetes,872,77120,1.130705394
CID_9,Ignavibacteriae,578,77120,0.749481328
CID_9,Lentisphaerae,264,77120,0.342323651
CID_9,Gemmatimonadetes,263,77120,0.341026971
CID_9,Fibrobacteres,170,77120,0.220435685
CID_9,Nitrospirae,148,77120,0.191908714
CID_9,Nitrospinae,136,77120,0.176348548
CID_9,Chlorobi,74,77120,0.095954357
CID_9,Tenericutes,71,77120,0.092064315
CID_9,Elusimicrobia,46,77120,0.059647303
CID_9,Armatimonadetes,27,77120,0.035010373
CID_9,Fusobacteria,25,77120,0.032417012
CID_9,Aquificae,13,77120,0.016856846
CID_9,Synergistetes,12,77120,0.015560166
CID_9,Deferribacteres,8,77120,0.010373444
CID_9,Thermotogae,7,77120,0.009076763
CID_9,Chrysiogenetes,3,77120,0.003890041
")

	[[alternative HTML version deleted]]



More information about the R-help mailing list