[R] Extracting metadata information to corresponding dissimilarity matrix

Thu May 18 15:08:33 CEST 2017

Brilliant, David, thank you so much!

Cheers, 

Rune 

> 16. mai 2017 kl. 18.44 skrev David L Carlson <dcarlson at tamu.edu>:
> 
> Fixing a typo in the original, adding a simplification, and using dissimilarity instead of similarity:
> 
> set.seed(42)
> dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE),
>     age=sample.int(75, 7))
> dsim <- dist(dta$age) # distance, already lower triangular
> dsim
> 
> dta1 <- dta
> names(dta1) <- paste0(names(dta), "1") # generalizes to more than 3 columns
> dta2 <- dta
> names(dta2) <- paste0(names(dta), "2")
> 
> dta12 <- merge(dta2, dta1) # order is important
> dta12 <- dta12[dta12$ID1 < dta12$ID2, ] # get rid of duplicates
> 
> dta12 <- data.frame(dta12, dsim=as.vector(dsim)) # Typo was here
> dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "dsim")]
> dta12
> 
> David C
> 
> 
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David L Carlson
> Sent: Tuesday, May 16, 2017 11:21 AM
> To: Rune Grønseth <nielsenrune at me.com>; r-help at r-project.org
> Subject: Re: [R] Extracting metadata information to corresponding dissimilarity matrix
> 
> I think this is what you are trying to do. I've created a data set with 7 rows and a similarity matrix based on age:
> 
> set.seed(42)
> dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE),
>     age=sample.int(75, 7))
> sim <- max(dist(dta$age)) - dist(dta$age) # already lower triangular
> sim
> 
> #    1  2  3  4  5  6
> # 2 24               
> # 3 21 59            
> # 4 40 46 43         
> # 5  0 38 41 22      
> # 6  7 45 48 29 55   
> # 7 55 31 28 47  7 14
> 
> # Now duplicate dta:
> dta1 <- dta
> names(dta1) <- c("ID1", "gender1", "age1")
> dta2 <- dta
> names(dta2) <- c("ID2", "gender2", "age2")
> 
> # Now merge and eliminate unneeded rows
> dta12 <- merge(dta2, dta1) # order is important
> dta12 <- dta12[dta12$ID1 < dta12$ID2, ]
> 
> # Finally combine the similarities with the combined data and rearrange
> # the variable names
> dta12 <- data.frame(dta12mod, sim=as.vector(sim))
> dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "sim")]
> dta12
> 
> #    ID1 ID2 gender1 gender2 age1 age2 sim
> # 2    1   2       F       F   11   49  24
> # 3    1   3       F       M   11   52  21
> # 4    1   4       F       F   11   33  40
> # 5    1   5       F       F   11   73   0
> # 6    1   6       F       F   11   66   7
> # 7    1   7       F       F   11   18  55
> # 10   2   3       F       M   49   52  59
> # 11   2   4       F       F   49   33  46
> # 12   2   5       F       F   49   73  38
> # 13   2   6       F       F   49   66  45
> # 14   2   7       F       F   49   18  31
> # 18   3   4       M       F   52   33  43
> # 19   3   5       M       F   52   73  41
> # 20   3   6       M       F   52   66  48
> # 21   3   7       M       F   52   18  28
> # 26   4   5       F       F   33   73  22
> # 27   4   6       F       F   33   66  29
> # 28   4   7       F       F   33   18  47
> # 34   5   6       F       F   73   66  55
> # 35   5   7       F       F   73   18   7
> # 42   6   7       F       F   66   18  14
> 
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rune Grønseth
> Sent: Tuesday, May 16, 2017 4:31 AM
> To: r-help at r-project.org
> Subject: [R] Extracting metadata information to corresponding dissimilarity matrix
> 
> Hi,
> I am R beginner. I've tried googling and reading, but this might be too simple to be found in the documentation. 
> 
> I have a dissimilarity index (symmetric matrix) from which I have extracted the unique values using the exodist package command "lower". There are 14 observations, so there are 91 unique comparisons.
> 
> After this I'd like to extract corresponding metadata from a separate data frame (the 14 observations organized in rows identified by a samplenumber-vector, and other variables as gender, age, et cetera). The aim is to have a new data frame with 91 rows and metadata vectors giving me the value of the dissimilarity index,  gender each of the two observations that are compared by the dissimilarity metric. So if I'm looking for gender differences, I need 5 vectors in the data frame: samplenumber1, samplenumber2, gender1, gender2 and dissimilarity metric.
> 
> Does anyone have suggestions or experiences in reformatting data in this manner? This is just a test-dataset. My full data-set is for more than 100 observations, so I need a more general code, if that is possible.
> 
> With great appreciation of any help.
> 
> Rune Grønseth 
> 
> ---
> 
> Rune Grønseth, MD, PhD, postdoctoral fellow
> Department of Thoracic Medicine
> Haukeland University Hospital
> N-5021 Bergen
> Norway
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.