[R] as.vector with mode="list" and POSIXct

Bert Gunter gunter.berton at gene.com
Wed May 22 22:05:43 CEST 2013


Gents:

You've both been polite and thoughtful, but I think you should take
your discussion private, no?

-- Bert

On Wed, May 22, 2013 at 12:57 PM, Alexandre Sieira
<alexandre.sieira at gmail.com> wrote:
> Please let's not turn this into an ad hominem discussion by adding remarks on what the other thinks or knows, as this will get us nowhere fast. Let's focus on the issue, ok? :)
>
> Again, the point behind my workaround was to try to change the rest of my program as little as possible while I waited for the maintainer of the hash package to respond. I found it was an acceptable compromise, even if it does, as you say, add complexity.
>
>
> As for embracing vectorization, I got into this problem exactly because I wanted the data to be returned in a vector using the values() function. in the first place.
>
>
> I agree with your observation that simpler is better. However, I won't get into the details of why I decided to use hash instead of other data structures in my architecture, since I don't mean to put that up for discussion on a public list. I understand you offered alternatives with the best of intentions, and I thank you. But after careful consideration I still think using hash is the best option  and will stick with it on my code.
>
> Given those premises, I would ask you and the list again if you think there is a better way of achieving what my unlistPOSIXct function does that is closer to the natural paradigm of R. The only equivalent I found in base R is the unlist function, but its documentation explicitly states it will coerce data to primitive data types. So unfortunately it doesn't help me.
>
> Working with POSIXct in a list precludes me from doing lots of necessary operations in a vectorized way, such as min() and max(), that will work on POSIXct vectors. That is why I need to convert the list back into a vector in an efficient manner and without unclassing the objects. Would really appreciate any help with that.
>
> Thank you again for your interest and advice.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
> On 22 de maio de 2013 at 15:59:46, Jeff Newmiller (jdnewmil at dcn.davis.ca.us) wrote:
> My perception of illogic was in your addition of more data structure complexity when faced with this difficulty. R has best performance when calculations are pushed into simple typed vectors where precompiled code can handle the majority of the work. These are simpler structures, not more complex structures. It seems like you are fighting the natural paradigm for working in R and holding fast to your ideas about how things "should be" rather than dealing with how they "are" by introducing lists rather than working with vectors or data frames.
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Alexandre Sieira <alexandre.sieira at gmail.com> wrote:
>
>>Hi, Jeff.
>>
>>Thanks for your thoughtful suggestions.
>>
>>I do not plan to wait for the hash package to be redesigned to meet my
>>expectations. As a matter of fact, I have:
>>
>>a) Submitted a report of unexpected behavior in hash::values, which the
>>package maintainer quickly replied to and said would examine.
>>b) Designed (with the help of this list) and implemented a workaround
>>in the form of wrapping the POSIXct objects in lists, which has my
>>program working correctly for now.
>>
>>If the hash package is updated and the  workaround is no longer
>>necessary, then I'll reverse this change. Otherwise, I'll look more
>>deeply into my alternatives which might involve maintaining this
>>workaround permanently, or analyzing alternative architectures.
>>
>>The hash package is a beautiful piece of code that is working perfectly
>>for me in many situations. Even with the list wrapping around the
>>POSIXct objects, it is meeting my performance requirements much better
>>than the alternatives I tested. So I'd rather not completely
>>re-engineer working complex code without a very good reason.
>>
>>However, I would like to respectfully disagree with you that my
>>reaction to hash::values behavior was illogical. I don't want to start
>>a flame war or anything, so let's try to keep the discussion civil. :)
>>
>>See, a hash table (or a queue, or a stack, or an R vector) is a data
>>structure that works as a container. You insert objects and you get
>>them back according to the specificities of each data structure (stacks
>>will have a FILO ordering, queues will have FIFO ordering, hashes will
>>maintain key/value pairs, and so).
>>
>>It is completely unreasonable to insert an object of class X into a
>>container, and then get it back altered in a way that is not part of
>>the 'contract' behind the data structure. If I assign X to key K on a
>>hash, however I choose to ask the hash for the value associated with
>>key K back, I should get exactly X as a response. I believe most
>>computer scientists would agree that to be self-evident.
>>
>>And that is to be expected by reading hash::values documentation:
>>
>> Extract values from a hash object. This is a pseudo- accessor method
>>that returns hash values (without keys) as a vector if possible, a list
>>otherwise.
>>
>>
>>Moreover, it has this to say about non-primitive types:
>>
>> If the values are of different types or of a complex class than a
>>named list is returned.
>>
>>
>>It never says it will unclass objects, or coerce them into primitive
>>types. Hence the 'contract' implies I will get back what I inserted,
>>unaltered, either in a vector or a list. And that is provably not what
>>is happening. I would have been ok with a vector of POSIXct or a named
>>list containing the POSIXct values, but instead I am getting a numeric
>>vector.
>>
>>I understand R is based on S, and that OOP concepts were introduced
>>later into its history. However, one of the key concepts in OOP is
>>encapsulation - as an outside entity you do not get to see the internal
>>implementation of a class, you interact with it exclusively through its
>>published "interface" (method, public member variables, etc).
>>
>>I cannot find any justification as for why an object "losing" its class
>>unintentionally is ever acceptable, as it violates the concept of
>>encapsulation. That is essentially what's happening if I look up
>>several keys using values(). So this violates the encapsulation of the
>>POSIXct class, as I am exposed to its internal numeric value. Moreover,
>>it breaks the "method-dispatch" of R functions that know to treat
>>POSIXct values differently. All of a sudden, the POSIXct objects I
>>inserted are being treated, for example, by format as numeric instead
>>of being dispatched to format.Date as expected.
>>
>>So I don't think my reaction to this issue was illogical at all. Hope
>>you'll agree now that I've explained myself a little better. :)
>>
>>--
>>Alexandre Sieira
>>CISA, CISSP, ISO 27001 Lead Auditor
>>
>>"The truth is rarely pure and never simple."
>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>On 21 de maio de 2013 at 22:44:19, Jeff Newmiller
>>(jdnewmil at dcn.davis.ca.us) wrote:
>>I recommend that you not plan on waiting for the hash package to be
>>redesigned to meet your expectations. Also, your response to
>>discovering this feature of the hash package seems illogical.
>>
> >From a computer science perspective, the hash mechanism is an
>>implementation trick that is intended to improve lookup speed. It does
>>not actually represent a fundamental data structure like a vector or a
>>set does. You can always put your keys in a vector and search through
>>them (e.g. vector indexing by string) to get an equivalent data
>>retrieval. If the hash package is not improving the speed of your data
>>access, adding an extra layer of data structure is hardly an
>>appropriate solution.
>>
>>Why are you not using normal vectors or data frames and accessing with
>>string or logical indexing?
>>
>>If you are avoiding vectors because they seem slow in loops, perhaps
>>you just need to preallocate the vectors you will store your results in
>>before your loop to regain acceptable speed. Or, perhaps the
>>duplicated() or merge() functions could save you from this mess of
>>incremental data processing.
>>---------------------------------------------------------------------------
>>
>>Jeff Newmiller The ..... ..... Go Live...
>>DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
>>Live: OO#.. Dead: OO#.. Playing
>>Research Engineer (Solar/Batteries O.O#. #.O#. with
>>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>>---------------------------------------------------------------------------
>>
>>Sent from my phone. Please excuse my brevity.
>>
>>Alexandre Sieira <alexandre.sieira at gmail.com> wrote:
>>
>>>You are absolutely right.
>>>
>>>I am storing POSIXct objects into a hash (from the hash package).
>>>However, if I try to get them out as a vector using the values()
>>>function, they are unclassed. And that breaks my (highly vectorized)
>>>code. Take a look at this:
>>>
>>>
>>>> h = hash()
>>>> h[["a"]] = Sys.time()
>>>> str(h[["a"]])
>>> POSIXct[1:1], format: "2013-05-20 16:54:28"
>>>> str(values(h))
>>> Named num 1.37e+09
>>> - attr(*, "names")= chr "a"
>>>
>>>
>>>I have reported this to the hash package maintainers. In the meantime,
>>
>>>however, I am storing, for each key, a list containing a single
>>>POSIXct. Then, when I extract all using values(), I get a list
>>>containing all POSIXct entries with class preserved.
>>>
>>>
>>>> h = hash()
>>>> h[["a"]] = list( Sys.time() )
>>>> h[["b"]] = list( Sys.time() )
>>>> h[["c"]] = list( Sys.time() )
>>>> values(h)
>>>$a
>>>[1] "2013-05-21 09:54:03 BRT"
>>>
>>>$b
>>>[1] "2013-05-21 09:54:07 BRT"
>>>
>>>$c
>>>[1] "2013-05-21 09:54:11 BRT"
>>>
>>>> str(values(h))
>>>List of 3
>>> $ a: POSIXct[1:1], format: "2013-05-21 09:54:03"
>>> $ b: POSIXct[1:1], format: "2013-05-21 09:54:07"
>>> $ c: POSIXct[1:1], format: "2013-05-21 09:54:11"
>>>
>>>
>>>However, the next thing I need to do is a min() over that list, so I
>>>need to convert the list into a vector again.
>>>
>>>I agree completely with you that this is horrible for performance, but
>>
>>>it is a temporary workaround until values() is "fixed".
>>>
>>>--
>>>Alexandre Sieira
>>>CISA, CISSP, ISO 27001 Lead Auditor
>>>
>>>"The truth is rarely pure and never simple."
>>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>>On 20 de maio de 2013 at 19:40:14, Jeff Newmiller
>>>(jdnewmil at dcn.davis.ca.us) wrote:
>>>I don't know what you plan to do with this list, but lists are quite a
>>
>>>bit less efficient than fixed-mode vectors, so you are likely losing a
>>
>>>lot of computational speed by using this list. I don't hesitate to use
>>
>>>simple data frames (lists of vectors), but processing lists is on par
>>
>>>with for loops, not vectorized computation. It may still support a
>>>simpler model of computation, but that is an analyst comprehension
>>>benefit rather than a computational efficiency benefit.
>>>---------------------------------------------------------------------------
>>
>>>
>>>Jeff Newmiller The ..... ..... Go Live...
>>>DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
>>>Live: OO#.. Dead: OO#.. Playing
>>>Research Engineer (Solar/Batteries O.O#. #.O#. with
>>>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>>>---------------------------------------------------------------------------
>>
>>>
>>>Sent from my phone. Please excuse my brevity.
>>>
>>>Alexandre Sieira <alexandre.sieira at gmail.com> wrote:
>>>
>>>>I was trying to convert a vector of POSIXct into a list of POSIXct,
>>>>However, I had a problem that I wanted to share with you.
>>>>
>>>>Works fine with, say, numeric:
>>>>
>>>>
>>>>> v = c(1, 2, 3)
>>>>> v
>>>>[1] 1 2 3
>>>>> str(v)
>>>> num [1:3] 1 2 3
>>>>> l = as.vector(v, mode="list")
>>>>> l
>>>>[[1]]
>>>>[1] 1
>>>>
>>>>[[2]]
>>>>[1] 2
>>>>
>>>>[[3]]
>>>>[1] 3
>>>>
>>>>> str(l)
>>>>List of 3
>>>> $ : num 1
>>>> $ : num 2
>>>> $ : num 3
>>>>
>>>>If you try it with POSIXct, on the other hand…
>>>>
>>>>
>>>>> v = c(Sys.time(), Sys.time())
>>>>> v
>>>>[1] "2013-05-20 18:02:07 BRT" "2013-05-20 18:02:07 BRT"
>>>>> str(v)
>>>> POSIXct[1:2], format: "2013-05-20 18:02:07" "2013-05-20 18:02:07"
>>>>> l = as.vector(v, mode="list")
>>>>> l
>>>>[[1]]
>>>>[1] 1369083728
>>>>
>>>>[[2]]
>>>>[1] 1369083728
>>>>
>>>>> str(l)
>>>>List of 2
>>>> $ : num 1.37e+09
>>>> $ : num 1.37e+09
>>>>
>>>>The POSIXct values are coerced to numeric, which is unexpected.
>>>>
>>>>The documentation for as.vector says: "The default method handles 24
>>
>>>>input types and 12 values of type: the details of most coercions are
>>
>>>>undocumented and subject to change." It would appear that treatment
>>>for
>>>>POSIXct is either missing or needs adjustment.
>>>>
>>>>Unlist (for the reverse) is documented to converting to base types,
>>so
>>>
>>>>I can't complain. Just wanted to share that I ended up giving up on
>>>>vectorization and writing the two following functions:
>>>>
>>>>
>>>>unlistPOSIXct <- function(x) {
>>>>  retval = rep(Sys.time(), length(x))
>>>>  for (i in 1:length(x)) retval[i] = x[[i]]
>>>>  return(retval)
>>>>}
>>>>
>>>>listPOSIXct <- function(x) {
>>>>  retval = list()
>>>>  for (i in 1:length(x)) retval[[i]] = x[i]
>>>>  return(retval)
>>>>}
>>>>
>>>>Is there a better way to do this (other than using *apply instead of
>>
>>>>for above) that better leverages vectorization? Am I missing
>>something
>>>
>>>>here?
>>>>
>>>>Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Alexandre Sieira
>>>>CISA, CISSP, ISO 27001 Lead Auditor
>>>>
>>>>"The truth is rarely pure and never simple."
>>>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>>>
>>>>------------------------------------------------------------------------
>>
>>>
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list