[R] Creating submatrices from a dataframe, depending on factors in sample names

Mon Dec 1 22:57:25 CET 2014

I may have misunderstood, but does this do what you want?

> df.mat <- as.matrix(df)
> same <- lapply(1:3, function(x) df.mat[grep(paste0("_", x), 
+ rownames(df.mat)), grep(paste0("_", x), colnames(df.mat))])
> same
[[1]]
           HQ673618_1 HQ674317_1 EU686630_1
HQ673618_1         NA       90.8       89.8
HQ674317_1       90.8         NA       98.6
EU686630_1       89.8       98.6         NA

[[2]]
           EU686593_2 JN166322_2 EU491340_2
EU686593_2         NA       98.1       96.8
JN166322_2       98.1         NA       97.5
EU491340_2       96.8       97.5         NA

[[3]]
           AB694259_3 AB694258_3 AB694462_3
AB694259_3         NA       98.3       95.9
AB694258_3       98.3         NA       95.8
AB694462_3       95.9       95.8         NA

> Diff <- as.matrix(expand.grid(1:3, 1:3))
> Diff <- Diff[Diff[,1]<Diff[,2],]
> different <- lapply(seq_len(nrow(Diff)), function(x) 
+ df.mat[grep(paste0("_", Diff[x,1]), rownames(df.mat)),
+ grep(paste0("_", Diff[x,2]), colnames(df.mat))])
> different
[[1]]
           EU686593_2 JN166322_2 EU491340_2
HQ673618_1       89.6       89.8       88.9
HQ674317_1       97.7       98.4       97.4
EU686630_1       98.4       98.9       97.7

[[2]]
           AB694259_3 AB694258_3 AB694462_3
HQ673618_1       87.8       88.2       88.3
HQ674317_1       94.9       96.2       95.1
EU686630_1       95.4       96.4       95.8

[[3]]
           AB694259_3 AB694258_3 AB694462_3
EU686593_2       94.4       95.6       94.8
JN166322_2       95.3       96.5       95.9
EU491340_2       96.5       97.7       96.0

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter
Sent: Monday, December 1, 2014 11:46 AM
To: Tim Richter-Heitmann
Cc: r-help at r-project.org
Subject: Re: [R] Creating submatrices from a dataframe, depending on factors in sample names

I do not have the patience to study your request carefully, but does
the following help?

> a <- 1:3
> x <- outer(a,a,paste,sep=".")
> x
     [,1]  [,2]  [,3]
[1,] "1.1" "1.2" "1.3"
[2,] "2.1" "2.2" "2.3"
[3,] "3.1" "3.2" "3.3"
> x[upper.tri(x)]
[1] "1.2" "1.3" "2.3"

> x[upper.tri(x,diag=TRUE)]
[1] "1.1" "1.2" "2.2" "1.3" "2.3" "3.3"

This gives you a vector all possible pairs (including identical pairs
or not) of values of a, which you could then loop over as an index to
do what you want, I think.

If this is not what you want, just ignore without replying.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll

On Mon, Dec 1, 2014 at 8:47 AM, Tim Richter-Heitmann
<trichter at uni-bremen.de> wrote:
> Hello there,
>
> this is a cross-post of a stack-overflow question, which wasnt answered, but
> is very important for my work. Apologies for breaking any rules, but i do
> hope for some help from the list instead:
>
> I have a huge matrix of pairwise similarity percentages between different
> samples. The samples are belonging to groups. The groups are determined by
> the suffix "_n" in the row.names/header names.
> In the first step, i wanted to create submatrices consisting of all pairs
> within single groups (i.e. for all samples from "_1").
> However, I realized that i need to know all pairwise submatrices, between
> all combination of groups. So, i want to create (a list of) vectors that are
> named "_n1 vs _n2" (or similar) for all combinations of n, as illustrated by
> the colored rectangulars:
>
> http://i.stack.imgur.com/XMkxj.png
>
> Reproducible code, as provided by helpful Stack Overflow members, dealing
> with identical "_n"s.
>
>
>         df <- structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8,
> 88.9,
>         87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4,
>         97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4,
>         98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4,
>         NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4,
>         98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9,
>         97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8,
>         94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2,
>         96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3,
>         95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names =
> c("HQ673618_1",
>         "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2",
> "EU491340_2",
>         "AB694259_3", "AB694258_3", "AB694462_3"), class = "data.frame",
> row.names = c("HQ673618_1",
>         "HQ674317_1", "EU686630_1", "EU686593_2", "JN166322_2",
> "EU491340_2",
>         "AB694259_3", "AB694258_3", "AB694462_3"))
>
>
>         indx <- gsub(".*_", "", names(df))
>         sub.matrices <- lapply(unique(indx), function(x) {
>           temp <- which(indx %in% x)
>           df[temp, temp]
>         })
>         unique_values <- lapply(sub.matrices, function(x) x[upper.tri(x)])
>         names(unique_values) <- unique(indx)
>
> This code needs to be expanded to form sub.matrices for any combination of
> unique indices in temp.
>
>
> Thank you so much!
>
>
>
>
> --
> Tim Richter-Heitmann (M.Sc.)
> PhD Candidate
>
>
>
> International Max-Planck Research School for Marine Microbiology
> University of Bremen
> Microbial Ecophysiology Group (AG Friedrich)
> FB02 - Biologie/Chemie
> Leobener Straße (NW2 A2130)
> D-28359 Bremen
> Tel.: 0049(0)421 218-63062
> Fax: 0049(0)421 218-63069
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.