[R] Having some Trouble Data Structures

Sun Oct 28 16:27:42 CET 2012

Search on "ragged array". 

My preferred approach is to use a data frame with one row per effector that repeats the per-ID information. If that occupies too much memory, you can setup another data frame with one row per ID and refer to that information as using lapply and subset the effectors data as needed. The plyr package is also useful for such processing.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

"Benjamin Ward (ENV)" <B.Ward at uea.ac.uk> wrote:

>Hi All,
>
>I'm trying to run a simulation of host-pathogen evolution based around
>individuals.
>What I need to have is a dataframe or table of some description -
>describing all the individuals of a pathogen population (so far I've
>implemented this as a matrix):
>
>     ID         No_of_Effectors                   Effectors (Sequences)
>  [1,] 0001              3                   ##   3 Random Numbers ##
>
>There will be many such rows for many individuals. They have something
>called effectors, the number of which is randomly generated, so say you
>get 3 in the No_of_Effectors column. Then I make R generate 3 numbers
>from between 1 and 10,000, this gives me three numerical
>representations of genes. These numbers will be compared to a similar
>data structure of the host individuals who have their immune genes with
>similar numbers.
>
>My problem is that obviously I can't stick 3 numbers in one "cell" of
>the matrix (I've tried) :
>
>Pathogen_Individuals[1,3] <- c(2,3,4)
>Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) :
>  number of items to replace is not a multiple of replacement length
>
>In future I'm also going to have more variables such as whether a gene
>is expressed. Such information may require a matrix in itself -
>something like:
>
>
>        Effector ID             Sequence                  Expressed?
> [1,]     0001              345,567,678                       1 (or 0).
>
>Is there a way then I can put more than one value in the cell like a
>list of values, or a way to put objects in a cell of a data frame,
>matrix or table etc. Almost an inception deal - data structures nested
>in a data structure? If I search for things like "insert list into
>matrix" I get results like how to turn one into another, which is not
>what I think I need to be doing.
>
>I have been considering having several data structures not nested in
>each other, something like for every individual create a new matrix
>object with the name Effectors_[Individual_ID] and some how get my
>simulation loops operating on those objects but I find it hard to see
>how to tell R all of those matrices are to be included in an operation,
>as you can all lines of a data frame for example with for loops.
>This is strange for me because this model was written in a macro-code
>for another program which handles data in a different format and layout
>to R.
>
>My problem is I think, each individual in the model has many variables
>- in this case representations of genes. So I'm having trouble getting
>my head about this.
>
>Hopefully someone more experienced will be able to offer advice or a
>solution, it will be very appreciated.
>
>Many Thanks,
>Ben Ward (ENV, UEA & The Sainsbury Lab, JIC).
>
>P.S. I have searched previous queries to the list, and I'm not sure but
>this may be useful for relevant:
>
>
>Have you thought of using a list?
>
>> a <- matrix(1:10, nrow=2)
>> b <- 1:5
>> x <- list(a=a, b=b)
>> x
>$a
>     [,1] [,2] [,3] [,4] [,5]
>[1,]    1    3    5    7    9
>[2,]    2    4    6    8   10
>
>$b
>[1] 1 2 3 4 5
>
>> x$a
>     [,1] [,2] [,3] [,4] [,5]
>[1,]    1    3    5    7    9
>[2,]    2    4    6    8   10
>> x$b
>[1] 1 2 3 4 5
>
>oliveoil and yarn datasets have been mentioned.
>
>
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.