[R] grouping data by a portion of the row name

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Sep 14 00:47:46 CEST 2007


Bricklemyer, Ross S wrote:
> I am attempting to write a routine where I can run PAM (partition around mediods) on a dataset containing multiple soil cores and PCA spectral data from several depths per core.  I want to run PAM on each individual core, so I need to group the data by core to run the analysis.  Below is an example of my data structure:
>
> Lab.id	PC1	PC2	PC3
> MAT057.2.5	2.438454966	-1.011182986	-3.040881377
> MAT057.7.5	10.69120648	4.767694892		-1.719466898
> MAT057.12.5	8.215852171	4.645793327		0.974020242
> MAT057.17.5	10.00422215	3.516213164		2.586742695
> MAT057.22.5	18.49165113	5.143031557		0.472636009
> MAT057.27.5	18.31255522	4.255319595		0.802902692
> MAT057.35	11.75818601	-0.325388031	3.445673092
> MAT057.45	6.043984786	-3.297325975	3.075221644
>
> The MAT057 is the core code and the values following the period refer to the sampling depths.  There are many cores in the dataset and I want to automate the analysis so that it will grab data with the same core code and run PAM.  Any ideas on what the R code would look like for that?
>
> Ross
>   
Looks like these aren't really row names but a variable called Lab.id.

Look into things like

sub("\\..*$",  "",  Lab.id)

or maybe

sapply(strsplit(Lab.id, "\\."), "[[", 1)

(if Lab.id is a factor, you need first to transform using as.character 
in the 2nd version)
> *******************************************************************
> Ross Bricklemyer
> Dept. of Crop and Soil Sciences
> Washington State University
> 291D Johnson Hall
> PO Box 646420
> Pullman, WA 99164-6420
> Work: 509.335.3661
> Cell/Home: 406.570.8576
> Fax: 509.335.8674
> Email: rsb at wsu.edu
>
>  
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list