[R] initiate elements in a dataframe with lists

William Dunlap wdun|@p @end|ng |rom t|bco@com
Wed Jul 25 20:18:17 CEST 2018


If you need to make a list of long but unknown length you can save time by
adding the items to an environment, with names giving the order, then
converting the environment to a list when you are done filling the
environment.  E.g.,

> makeData
function (container, n)
{
    for (i in seq_len(n)) container[[sprintf("%06x", i)]] <- seq_len(i%%5)
    container
}
> # use an environment
> system.time(E <- makeData(new.env(parent=emptyenv()), 10^5))
   user  system elapsed
   0.38    0.00    0.38
> # convert environment to a list
> system.time(EL <- as.list(E, sorted=TRUE))
   user  system elapsed
   0.62    0.00    0.62
> # use a list
> system.time(L <- makeData(list(), 10^5))
   user  system elapsed
 142.56    1.46  153.78
> all.equal(EL, L)
[1] TRUE


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jul 25, 2018 at 10:43 AM, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:

> The code below reeks of a misconception that lists are efficient to add
> items to, which is a confusion with the computer science term "linked
> list".  In R, a list is NOT a linked list... it is a vector, which means
> the memory used by the list is allocated at the time it is created, and
> REALLOCATED when a new item is added. The only reason you should use a list
> is because you expect to put values of different types or shapes into it,
> which does not appear to apply in this use case.
>
> In R, you should make a valiant effort to create things right the first
> time, and if that doesn't work then preallocate the space you will need in
> the vectors you are working with. Since you have a need to store a variable
> number of elements in each intersectX element, the column needs to be a
> list but the elements of that list can perfectly well be character vectors.
>
> x <- data.frame( TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA")
>                , CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2")
>                , POSA=c(10, 15, 120, 340, 100, 220)
>                , CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1")
>                , POSB=c(30, 100, 300, 20, 200, 320)
>                , stringsAsFactors = FALSE
>                )
> compareRng <- function( chr1, pos1, chr2, pos2, delta ) {
>   ( chr1 == chr2
>   & ( pos2 - delta ) < pos1
>   & pos1 < ( pos2 + delta )
>   )
> }
> makeIntersectX <- function( n, chrlabel, poslabel, delta ) {
>   lgclidx <- rep( TRUE, nrow( x ) )
>   lgclidx[ n ] <- FALSE
>   x[[ chrlabel ]][ compareRng( x[[ chrlabel ]][ n ]
>                     , x[[ poslabel ]][ n ]
>                     , x[[ chrlabel ]]
>                     , x[[ poslabel ]]
>                     , delta
>                     )
>         & lgclidx
>         ]
> }
>
> x$intersectA <- lapply( seq.int( nrow( x ) )
>                       , makeIntersectX
>                       , chrlabel = "CHRA"
>                       , poslabel = "POSA"
>                       , delta = 10L
>                       )
> x$intersectB <- lapply( seq.int( nrow( x ) )
>                       , makeIntersectX
>                       , chrlabel = "CHRB"
>                       , poslabel = "POSB"
>                       , delta = 21L
>                       )
>
>> x
>>
>   TYPE CHRA POSA CHRB POSB intersectA intersectB
> 1  DEL chr1   10 chr1   30       chr1
> 2  DEL chr1   15 chr1  100       chr1
> 3  DUP chr1  120 chr1  300                  chr1
> 4  TRA chr1  340 chr2   20
> 5  INV chr2  100 chr2  200
> 6  TRA chr2  220 chr1  320                  chr1
>
> Note that depending on what you plan to do beyond this point, it might
> actually be more performant to use a data frame with repeated rows instead
> of list columns... but I cannot tell from what you have provided.
>
> On Wed, 25 Jul 2018, Bogdan Tanasa wrote:
>
> Dear Thierry and Juan, thank you for your help. Thank you all.
>>
>> Now, if I would like to add an element to the empty list, how shall I do :
>> for example, shall i = 2, and j = 1, in a bit of more complex R code :
>>
>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>> POSA=c(10, 15, 120, 340, 100, 220),
>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>> POSB=c(30, 100, 300, 20, 200, 320))
>>
>> x$labA <- paste(x$CHRA, x$POSA, sep="_")
>> x$labB <- paste(x$CHRB, x$POSB, sep="_")
>>
>> x$POSA_left <- x$POSA - 10
>> x$POSA_right <- x$POSA + 10
>>
>> x$POSB_left <- x$POSB - 10
>> x$POSB_right <- x$POSB + 10
>>
>> x$intersectA <- rep(list(list()), nrow(x))
>> x$intersectB <- rep(list(list()), nrow(x))
>>
>> And we know that for i = 2, and j = 1, the condition is TRUE :
>>
>> i <- 2
>>
>> j <- 1
>>
>> if ( (x$CHRA[i] == x$CHRA[j] ) &&
>>     (x$POSA[i] > x$POSA_left[j] ) &&
>>     (x$POSA[i] < x$POSA_right[j] ) ){
>>   x$intersectA[i] <- c(x$intersectA[i], x$labA[j])}
>>
>> the R code does not work. Thank you for your kind help !
>>
>> On Wed, Jul 25, 2018 at 12:26 AM, Thierry Onkelinx <
>> thierry.onkelinx using inbo.be
>>
>>> wrote:
>>>
>>
>> Dear Bogdan,
>>>
>>> You are looking for x$intersectA <- vector("list", nrow(x))
>>>
>>> Best regards,
>>>
>>>
>>> ir. Thierry Onkelinx
>>> Statisticus / Statistician
>>>
>>> Vlaamse Overheid / Government of Flanders
>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>> AND
>>> FOREST
>>> Team Biometrie &
>>> <https://maps.google.com/?q=Biometrie+%26+&entry=gmail&source=g>Kwaliteitszorg
>>> / Team Biometrics & Quality Assurance
>>> thierry.onkelinx using inbo.be
>>> Havenlaan 88
>>> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
>>> 1000 Brussel
>>> www.inbo.be
>>>
>>> ////////////////////////////////////////////////////////////
>>> ///////////////////////////////
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to
>>> say
>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> ~ John Tukey
>>> ////////////////////////////////////////////////////////////
>>> ///////////////////////////////
>>>
>>> <https://www.inbo.be>
>>>
>>> 2018-07-25 8:55 GMT+02:00 Bogdan Tanasa <tanasa using gmail.com>:
>>>
>>> Dear all,
>>>>
>>>> assuming that I do have a dataframe like :
>>>>
>>>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>>>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>>>> POSA=c(10, 15, 120, 340, 100, 220),
>>>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>>>> POSB=c(30, 100, 300, 20, 200, 320)) ,
>>>>
>>>> how could I initiate another 2 columns in x, where each element in
>>>> these 2
>>>> columns is going to be a list (the list could be updated later). Thank
>>>> you !
>>>>
>>>> Shall I do,
>>>>
>>>> for (i in 1:dim(x)[1]) { x$intersectA[i] <- list()}
>>>>
>>>> for (i in 1:dim(x)[1]) { x$intersectB[i] <- list()}
>>>>
>>>> nothing is happening. Thank you very much !
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list