[R] problem for strsplit function
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sat Jul 10 02:31:36 CEST 2021
My mental model for the `[` vs `[[` behavior is that `[` indexes multiple results while `[[` indexes only one item. If returning multiple items from a list the result must be a list. For consistency, `[` always returns a list when applied to a list. The double bracket drops the containing list.
The is.vector() behavior is not intuitive to me... I avoid that function, as I think it is more useful to think of lists as vectors than as something "other".
On July 9, 2021 3:44:29 PM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>OK, I stand somewhat chastised.
>
>But my point still is that what you get when you "extract" depends on
>how you define "extract." Do note that ?"[" yields a help file titled
>"Extract or Replace Parts of an object"; and afaics, the term "subset"
>is not explicitly used as Duncan prefers. The relevant part of the
>Help file says for "[" for recursive objects says: "Indexing by [ is
>similar to atomic vectors and selects a list of the specified
>element(s)." That a data.frame is a list is explicitly stated, as I
>noted; that lists are in fact vectors is also explicitly stated (?list
>says: "Almost all lists in R internally are Generic Vectors") but then
>one is stuck with: a data.frame is a list and therefore a vector, but
>is.vector(d3) is FALSE. The explanation is explicit again in
>?is.vector ("is.vector returns TRUE if x is a vector of the specified
>mode having no attributes other than names. It returns FALSE
>otherwise."). But I would say these issues are sufficiently murky that
>my warning to be precise is not entirely inappropriate; unfortunately,
>I may have made them more so. Sigh....
>
>Cheers,
>Bert
>
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch
><murdoch.duncan using gmail.com> wrote:
>>
>> On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
>> > "Strictly speaking", Greg is correct, Bert.
>> >
>> >
>https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
>> >
>> > Lists in R are vectors. What we colloquially refer to as "vectors"
>are more precisely referred to as "atomic vectors". And without a
>doubt, this "vector" nature of lists is a key underlying concept that
>explains why adding a dim attribute creates a matrix that can hold data
>frames. It is also a stumbling block for programmers from other
>languages that have things like linked lists.
>>
>> I would also object to v3 (below) as "extracting" a column from d.
>> "d[2]" doesn't extract anything, it "subsets" the data frame, so the
>> result is a data frame, not what you get when you extract something
>from
>> a data frame.
>>
>> People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly
>legal.
>> That extracts the 3rd element (the number 3). The problem is that R
>has
>> no way to represent a scalar number, only a vector of numbers, so
>x[[3]]
>> gets promoted to a vector containing that number when it is returned
>and
>> assigned to y.
>>
>> Lists are vectors of R objects, so if x is a list, x[[3]] is
>something
>> that can be returned, and it is different from x[3].
>>
>> Duncan Murdoch
>>
>> >
>> > On July 9, 2021 2:36:19 PM PDT, Bert Gunter
><bgunter.4567 using gmail.com> wrote:
>> >> "1. a column, when extracted from a data frame, *is* a vector."
>> >> Strictly speaking, this is false; it depends on exactly what is
>meant
>> >> by "extracted." e.g.:
>> >>
>> >>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
>> >>> v1 <- d[,2] ## a vector
>> >>> v2 <- d[[2]] ## the same, i.e
>> >>> identical(v1,v2)
>> >> [1] TRUE
>> >>> v3 <- d[2] ## a data.frame
>> >>> v1
>> >> [1] "a" "b" "c" ## a character vector
>> >>> v3
>> >> col2
>> >> 1 a
>> >> 2 b
>> >> 3 c
>> >>> is.vector(v1)
>> >> [1] TRUE
>> >>> is.vector(v3)
>> >> [1] FALSE
>> >>> class(v3) ## data.frame
>> >> [1] "data.frame"
>> >> ## but
>> >>> is.list(v3)
>> >> [1] TRUE
>> >>
>> >> which is simply explained in ?data.frame (where else?!) by:
>> >> "A data frame is a **list** [emphasis added] of variables of the
>same
>> >> number of rows with unique row names, given class "data.frame". If
>no
>> >> variables are included, the row names determine the number of
>rows."
>> >>
>> >> "2. maybe your question is "is a given function for a vector, or
>for a
>> >> data frame/matrix/array?". if so, i think the only way is
>reading
>> >> the help information (?foo)."
>> >>
>> >> Indeed! Is this not what the Help system is for?! But note also
>that
>> >> the S3 class system may somewhat blur the issue: foo() may work
>> >> appropriately and differently for different (S3) classes of
>objects. A
>> >> detailed explanation of this behavior can be found in appropriate
>> >> resources or (more tersely) via ?UseMethod .
>> >>
>> >> "you might find reading ?"[" and ?"[.data.frame" useful"
>> >>
>> >> Not just 'useful" -- **essential** if you want to work in R,
>unless
>> >> one gets this information via any of the numerous online
>tutorials,
>> >> courses, or books that are available. The Help system is accurate
>and
>> >> authoritative, but terse. I happen to like this mode of
>documentation,
>> >> but others may prefer more extended expositions. I stand by this
>claim
>> >> even if one chooses to use the "Tidyverse", data.table package, or
>> >> other alternative frameworks for handling data. Again, others may
>> >> disagree, but R is structured around these basics, and imo one
>remains
>> >> ignorant of them at their peril.
>> >>
>> >> Cheers,
>> >> Bert
>> >>
>> >>
>> >> Bert Gunter
>> >>
>> >> "The trouble with having an open mind is that people keep coming
>along
>> >> and sticking things into it."
>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>
>> >> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu>
>> >> wrote:
>> >>>
>> >>> Kai,
>> >>>
>> >>>> one more question, how can I know if the function is for column
>> >>>> manipulations or for vector?
>> >>>
>> >>> i still stumble around R code. but, i'd say the following (and
>look
>> >>> forward to being corrected! :):
>> >>>
>> >>> 1. a column, when extracted from a data frame, *is* a vector.
>> >>>
>> >>> 2. maybe your question is "is a given function for a vector, or
>for
>> >> a
>> >>> data frame/matrix/array?". if so, i think the only way is
>> >> reading
>> >>> the help information (?foo).
>> >>>
>> >>> 3. sometimes, extracting the column as a vector from a data
>> >> frame-like
>> >>> object might be non-intuitive. you might find reading ?"["
>and
>> >>> ?"[.data.frame" useful (as well as ?"[.data.table" if you
>use
>> >> that
>> >>> package). also, the str() command can be helpful in
>> >> understanding
>> >>> what is happening. (the lobstr:: package's sxp() function,
>as
>> >> well
>> >>> as more verbose .Internal(inspect()) can also give you
>insight.)
>> >>>
>> >>> with the data.table:: package, for example, if "DT" is a
>> >> data.table
>> >>> object, with "x2" as a column, adding or leaving off
>quotation
>> >> marks
>> >>> for the column name can make all the difference between
>ending up
>> >>> with a vector, or with a (much reduced) data table:
>> >>> ----
>> >>>> is.vector(DT[, x2])
>> >>> [1] TRUE
>> >>>> str(DT[, x2])
>> >>> num [1:9] 32 32 32 32 32 32 32 32 32
>> >>>>
>> >>>> is.vector(DT[, "x2"])
>> >>> [1] FALSE
>> >>>> str(DT[, "x2"])
>> >>> Classes ‘data.table’ and 'data.frame': 9 obs. of 1 variable:
>> >>> $ x2: num 32 32 32 32 32 32 32 32 32
>> >>> - attr(*, ".internal.selfref")=<externalptr>
>> >>> ----
>> >>>
>> >>> a second level of indexing may or may not help, mostly
>depending
>> >> on
>> >>> the use of '[' versus of '[['. this can sometimes cause
>> >> confusion
>> >>> when you are learning the language.
>> >>> ----
>> >>>> str(DT[, "x2"][1])
>> >>> Classes ‘data.table’ and 'data.frame': 1 obs. of 1 variable:
>> >>> $ x2: num 32
>> >>> - attr(*, ".internal.selfref")=<externalptr>
>> >>>> str(DT[, "x2"][[1]])
>> >>> num [1:9] 32 32 32 32 32 32 32 32 32
>> >>> ----
>> >>>
>> >>> the tibble:: package (used in, e.g., the dplyr:: package)
>also
>> >>> (always?) returns a single column as a non-vector. again, a
>> >>> second indexing with double '[[]]' can produce a vector.
>> >>> ----
>> >>>> DP <- tibble(DT)
>> >>>> is.vector(DP[, "x2"])
>> >>> [1] FALSE
>> >>>> is.vector(DP[, "x2"][[1]])
>> >>> [1] TRUE
>> >>> ----
>> >>>
>> >>> but, note that a list of lists is also a vector:
>> >>>> is.vector(list(list(1), list(1,2,3)))
>> >>> [1] TRUE
>> >>>> str(list(list(1), list(1,2,3)))
>> >>> List of 2
>> >>> $ :List of 1
>> >>> ..$ : num 1
>> >>> $ :List of 3
>> >>> ..$ : num 1
>> >>> ..$ : num 2
>> >>> ..$ : num 3
>> >>>
>> >>> etc.
>> >>>
>> >>> hth. good luck learning!
>> >>>
>> >>> cheers, Greg
>> >>>
>> >>> ______________________________________________
>> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible
>code.
>> >>
>> >> ______________________________________________
>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>>
--
Sent from my phone. Please excuse my brevity.
More information about the R-help
mailing list