[R-sig-eco] Association Routine?

Bob O'Hara bohara at senckenberg.de
Sat Feb 28 19:42:45 CET 2015


On 02/28/2015 06:48 PM, Alexandre F. Souza wrote:
> Dear friends,
>
> I need to write a code to find data using one variable as reference. The
> code I wrote, however, is not working and I can't figure it out why. Could
> anyone help me?
>
> Imagine a data set with two variables, B and C. Now I have variable A,
> which is the same variable as variable B but the data are not in the same
> order nor have necessarily the same extension as B (it may be a sample of
> B, for example).
>
> I want to find the values of variable C that match each line in variable A
> using B as the association criterion. So the code should perform a loop in
> which it would take the first line in A, search B until it finds it there,
> then copy the corresponding value of C and store it in a new variable D. Do
> it until all lines in A have been associated to a C value.

starting with...

df<-data.frame(B=sample(letters[1:10],replace=FALSE), C=rnorm(10), 
stringsAsFactors=FALSE)
A=letters[1:10]

two thoughts spring to mind:
(a) would merge() do what you want? e.g. df2 <- 
merge(df,data.frame(A=A), by.x="B", by.y="A"), and then extract the 
values of C with df2$C[df2$B=="f"], for example.
(b) sapply(A, function(lt, DF) DF$C[DF$B==lt], DF=df)

R's looping is generally more efficient when it's done internally, so it 
will be easier for you if you understand the R mentality, in particular 
vectorisation. usually if you have a for() loop, you're not writing R 
code efficiently.

Bob


> Here is the code I wrote:
>
>
> # Considering that matrices data.ref and data.assoc have been already read,
> containing the
>
> # User-defined number of columns to be associated with A (I imagined that
> more than one variable could be associated at once)
> col.assoc = 20
>
> # To assure that data will not be in a non-usable data category
> ref = as.matrix(data.ref)
> assoc = as.matrix(data.assoc)
>
>
> # Table where results will be stored
> #  Number of columns = n associated variables plus one column
> #  Reserved to receive the initial data (example column A)
>
> result = matrix(nrow = nrow(ref), ncol = col.assoc + 1)
>
> # Fulfill the first column of the result table with the original reference
> variable
>
> result[,1] = ref[,1]
>
>
> for (i in 1:nrow(ref)){
>    for (j in 1:nrow(assoc))
>     if (ref[i, 1] == assoc[j, 1]){
>       resultado[i, 2] == assoc[j, 2]
>     }
> }
>
>
>
> col = ncol(dados)
>
> ####
>
> Any thoughts?
>
> Thanks in advance,
>
> Alexandre
>


-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org



More information about the R-sig-ecology mailing list