[R] initiate elements in a dataframe with lists
William Dunlap
wdun|@p @end|ng |rom t|bco@com
Wed Jul 25 20:18:17 CEST 2018
If you need to make a list of long but unknown length you can save time by
adding the items to an environment, with names giving the order, then
converting the environment to a list when you are done filling the
environment. E.g.,
> makeData
function (container, n)
{
for (i in seq_len(n)) container[[sprintf("%06x", i)]] <- seq_len(i%%5)
container
}
> # use an environment
> system.time(E <- makeData(new.env(parent=emptyenv()), 10^5))
user system elapsed
0.38 0.00 0.38
> # convert environment to a list
> system.time(EL <- as.list(E, sorted=TRUE))
user system elapsed
0.62 0.00 0.62
> # use a list
> system.time(L <- makeData(list(), 10^5))
user system elapsed
142.56 1.46 153.78
> all.equal(EL, L)
[1] TRUE
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jul 25, 2018 at 10:43 AM, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:
> The code below reeks of a misconception that lists are efficient to add
> items to, which is a confusion with the computer science term "linked
> list". In R, a list is NOT a linked list... it is a vector, which means
> the memory used by the list is allocated at the time it is created, and
> REALLOCATED when a new item is added. The only reason you should use a list
> is because you expect to put values of different types or shapes into it,
> which does not appear to apply in this use case.
>
> In R, you should make a valiant effort to create things right the first
> time, and if that doesn't work then preallocate the space you will need in
> the vectors you are working with. Since you have a need to store a variable
> number of elements in each intersectX element, the column needs to be a
> list but the elements of that list can perfectly well be character vectors.
>
> x <- data.frame( TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA")
> , CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2")
> , POSA=c(10, 15, 120, 340, 100, 220)
> , CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1")
> , POSB=c(30, 100, 300, 20, 200, 320)
> , stringsAsFactors = FALSE
> )
> compareRng <- function( chr1, pos1, chr2, pos2, delta ) {
> ( chr1 == chr2
> & ( pos2 - delta ) < pos1
> & pos1 < ( pos2 + delta )
> )
> }
> makeIntersectX <- function( n, chrlabel, poslabel, delta ) {
> lgclidx <- rep( TRUE, nrow( x ) )
> lgclidx[ n ] <- FALSE
> x[[ chrlabel ]][ compareRng( x[[ chrlabel ]][ n ]
> , x[[ poslabel ]][ n ]
> , x[[ chrlabel ]]
> , x[[ poslabel ]]
> , delta
> )
> & lgclidx
> ]
> }
>
> x$intersectA <- lapply( seq.int( nrow( x ) )
> , makeIntersectX
> , chrlabel = "CHRA"
> , poslabel = "POSA"
> , delta = 10L
> )
> x$intersectB <- lapply( seq.int( nrow( x ) )
> , makeIntersectX
> , chrlabel = "CHRB"
> , poslabel = "POSB"
> , delta = 21L
> )
>
>> x
>>
> TYPE CHRA POSA CHRB POSB intersectA intersectB
> 1 DEL chr1 10 chr1 30 chr1
> 2 DEL chr1 15 chr1 100 chr1
> 3 DUP chr1 120 chr1 300 chr1
> 4 TRA chr1 340 chr2 20
> 5 INV chr2 100 chr2 200
> 6 TRA chr2 220 chr1 320 chr1
>
> Note that depending on what you plan to do beyond this point, it might
> actually be more performant to use a data frame with repeated rows instead
> of list columns... but I cannot tell from what you have provided.
>
> On Wed, 25 Jul 2018, Bogdan Tanasa wrote:
>
> Dear Thierry and Juan, thank you for your help. Thank you all.
>>
>> Now, if I would like to add an element to the empty list, how shall I do :
>> for example, shall i = 2, and j = 1, in a bit of more complex R code :
>>
>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>> POSA=c(10, 15, 120, 340, 100, 220),
>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>> POSB=c(30, 100, 300, 20, 200, 320))
>>
>> x$labA <- paste(x$CHRA, x$POSA, sep="_")
>> x$labB <- paste(x$CHRB, x$POSB, sep="_")
>>
>> x$POSA_left <- x$POSA - 10
>> x$POSA_right <- x$POSA + 10
>>
>> x$POSB_left <- x$POSB - 10
>> x$POSB_right <- x$POSB + 10
>>
>> x$intersectA <- rep(list(list()), nrow(x))
>> x$intersectB <- rep(list(list()), nrow(x))
>>
>> And we know that for i = 2, and j = 1, the condition is TRUE :
>>
>> i <- 2
>>
>> j <- 1
>>
>> if ( (x$CHRA[i] == x$CHRA[j] ) &&
>> (x$POSA[i] > x$POSA_left[j] ) &&
>> (x$POSA[i] < x$POSA_right[j] ) ){
>> x$intersectA[i] <- c(x$intersectA[i], x$labA[j])}
>>
>> the R code does not work. Thank you for your kind help !
>>
>> On Wed, Jul 25, 2018 at 12:26 AM, Thierry Onkelinx <
>> thierry.onkelinx using inbo.be
>>
>>> wrote:
>>>
>>
>> Dear Bogdan,
>>>
>>> You are looking for x$intersectA <- vector("list", nrow(x))
>>>
>>> Best regards,
>>>
>>>
>>> ir. Thierry Onkelinx
>>> Statisticus / Statistician
>>>
>>> Vlaamse Overheid / Government of Flanders
>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>> AND
>>> FOREST
>>> Team Biometrie &
>>> <https://maps.google.com/?q=Biometrie+%26+&entry=gmail&source=g>Kwaliteitszorg
>>> / Team Biometrics & Quality Assurance
>>> thierry.onkelinx using inbo.be
>>> Havenlaan 88
>>> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
>>> 1000 Brussel
>>> www.inbo.be
>>>
>>> ////////////////////////////////////////////////////////////
>>> ///////////////////////////////
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to
>>> say
>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> ~ John Tukey
>>> ////////////////////////////////////////////////////////////
>>> ///////////////////////////////
>>>
>>> <https://www.inbo.be>
>>>
>>> 2018-07-25 8:55 GMT+02:00 Bogdan Tanasa <tanasa using gmail.com>:
>>>
>>> Dear all,
>>>>
>>>> assuming that I do have a dataframe like :
>>>>
>>>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>>>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>>>> POSA=c(10, 15, 120, 340, 100, 220),
>>>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>>>> POSB=c(30, 100, 300, 20, 200, 320)) ,
>>>>
>>>> how could I initiate another 2 columns in x, where each element in
>>>> these 2
>>>> columns is going to be a list (the list could be updated later). Thank
>>>> you !
>>>>
>>>> Shall I do,
>>>>
>>>> for (i in 1:dim(x)[1]) { x$intersectA[i] <- list()}
>>>>
>>>> for (i in 1:dim(x)[1]) { x$intersectB[i] <- list()}
>>>>
>>>> nothing is happening. Thank you very much !
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list