[R] help, please! matrix operations inside 3 nested loops

Petr PIKAL petr.pikal at precheza.cz
Thu Aug 9 14:08:47 CEST 2012


Hi

> thank you for your help.
> 
> my input data looks like this (tab separated):
> 
> Ind.nr.   Pop.nr.   scm266   rms1280   scm247   rms1107
> 1   101   305   318   222   135
> 1   101   305   318   231   135
> 2   101   305   313   999   96
> 2   101   305   321   999   130
> 3   101   305   324   231   135
> 3   101   305   324   231   135
> 4   101   305   313   230   126
> 4   101   305   313   230   135
> 6   101   305   313   231   135
> 6   101   305   321   231   135

Better to use dput(your.data) for sharing data. Anyway I am still confused 
but you probably are able to clarify things further.

> 
> it is a dataset with genetic marker alleles for single individuals. 
> the first row is the header, all following rows are individuals. 2 rows
> count for 1 individual.
> first colum is the individual's number, second colum is the number for 
the
> population the individual comes from, and all following colums are 
different
> genetic markers.
> 
> what i want to do with this data in R, is to compare one individual with

In those 2 rows for one individual sometimes the genetic marker differs

> test[1:2, "scm247"]
[1] 222 231

What do you want to do with them?

> each of the other individuals, allele-wise. there are five 
possibilities:
> the two compared individuals share 4,3,2,1,0 alleles of the currently
> examined marker (=colum). for each shared allele this pair of 
individuals
> shall get 1 scoring point. for each pair of individuals, all scoring 
points
> shall be summarized over all markers.

Based on your example, 

> dput(test)
structure(list(Ind.nr. = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 6L, 
6L), Pop.nr. = c(101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 
101L, 101L), scm266 = c(305L, 305L, 305L, 305L, 305L, 305L, 305L, 
305L, 305L, 305L), rms1280 = c(318L, 318L, 313L, 321L, 324L, 
324L, 313L, 313L, 313L, 321L), scm247 = c(222L, 231L, 999L, 999L, 
231L, 231L, 230L, 230L, 231L, 231L), rms1107 = c(135L, 135L, 
96L, 130L, 135L, 135L, 126L, 135L, 135L, 135L)), .Names = c("Ind.nr.", 
"Pop.nr.", "scm266", "rms1280", "scm247", "rms1107"), class = 
"data.frame", row.names = c(NA, 
-10L))

what is your desired result?

Regards
Petr


> 
> 
> my code again, modified according to your suggestions:
> 
> #1) read in data:
> daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, 
sep="\t")
> daten<-as.data.frame(daten)
> 
> #2) create empty matrix:
> indxind<-matrix(0,nrow=617, ncol=617) 
> indxind[1:20,1:19]
> 
> #3) compare cells to each other, score:
> #for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
> for (s in 3:6) {   #walks though the matrix colum by colum, starting at
> colum 3
>   for (z1 in 1:6) {  #for each current colum, take one row (z1)...
>     for (z2 in 1:6) {  #...and compare it to another row (z2) of the 
current
> colum
>       if (z1!=z2) {topf<-indxind[z1,z2]
>                    if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1 
> #actually, 2 rows make up 1 individual,
>                    if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1 
> #therefore i compare 2 rows
>                    if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1 
> #with another 2 rows
>                    if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
>                    indxind[z1,z2]<-topf
>                    indxind[z2,z1]<-topf
>       }
>       #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, 
but
> gives always 8 for indxind[1,2]
>     }
>     #indxind[1:5,1:5] #empty matrix
>   }
>   #indxind[1:5,1:5] #empty matrix
> }
> 
> #4) check:
> indxind[1:5,1:5]
> 
> 
> 
> @ Michael Weylandt: i've done my best with regard to the "big picture" 
of my
> algorithm and the small reproducible example. i hope both is sufficient.
> @ Petr Pikal-3: in this case, there are only numerical values, but it's 
a
> useful hint for my other codes.
> @ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's 
instead
> of NAs helps, it fills something in indxind now. but it does the 
calculation
> only for the first marker (colum 3), afterwards i get an error: 
> Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf 
+ 
> : 
>   Fehlender Wert, wo TRUE/FALSE nötig ist
> Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf 
+  :
>   Missing value, where TRUE/FAlse is required
> Has this something to do with the changing to 
daten<-as.data.frame(daten) in
> line 3 (instead of as.matrix before)?
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/help-please-
> matrix-operations-inside-3-nested-loops-tp4639592p4639730.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list