[R] problem for strsplit function

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sat Jul 10 00:44:29 CEST 2021


OK, I stand somewhat chastised.

But my point still is that what you get when you "extract" depends on
how you define "extract." Do note that ?"[" yields a help file titled
"Extract or Replace Parts of an object"; and afaics, the term "subset"
is not explicitly used as Duncan prefers. The relevant part of the
Help file says for "[" for recursive objects says: "Indexing by [ is
similar to atomic vectors and selects a list of the specified
element(s)."  That a data.frame is a list is explicitly stated, as I
noted; that lists are in fact vectors is also explicitly stated (?list
says: "Almost all lists in R internally are Generic Vectors") but then
one is stuck with: a data.frame is a list and therefore a vector, but
is.vector(d3) is FALSE. The explanation is explicit again in
?is.vector ("is.vector returns TRUE if x is a vector of the specified
mode having no attributes other than names. It returns FALSE
otherwise."). But I would say these issues are sufficiently murky that
my warning to be precise is not entirely inappropriate; unfortunately,
I may have made them more so. Sigh....

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>
> On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
> > "Strictly speaking", Greg is correct, Bert.
> >
> > https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
> >
> > Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
>
> I would also object to v3 (below) as "extracting" a column from d.
> "d[2]" doesn't extract anything, it "subsets" the data frame, so the
> result is a data frame, not what you get when you extract something from
> a data frame.
>
> People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal.
> That extracts the 3rd element (the number 3).  The problem is that R has
> no way to represent a scalar number, only a vector of numbers, so x[[3]]
> gets promoted to a vector containing that number when it is returned and
> assigned to y.
>
> Lists are vectors of R objects, so if x is a list, x[[3]] is something
> that can be returned, and it is different from x[3].
>
> Duncan Murdoch
>
> >
> > On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >> "1.  a column, when extracted from a data frame, *is* a vector."
> >> Strictly speaking, this is false; it depends on exactly what is meant
> >> by "extracted." e.g.:
> >>
> >>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
> >>> v1 <- d[,2] ## a vector
> >>> v2 <- d[[2]] ## the same, i.e
> >>> identical(v1,v2)
> >> [1] TRUE
> >>> v3 <- d[2] ## a data.frame
> >>> v1
> >> [1] "a" "b" "c"  ## a character vector
> >>> v3
> >>   col2
> >> 1    a
> >> 2    b
> >> 3    c
> >>> is.vector(v1)
> >> [1] TRUE
> >>> is.vector(v3)
> >> [1] FALSE
> >>> class(v3)  ## data.frame
> >> [1] "data.frame"
> >> ## but
> >>> is.list(v3)
> >> [1] TRUE
> >>
> >> which is simply explained in ?data.frame (where else?!) by:
> >> "A data frame is a **list** [emphasis added] of variables of the same
> >> number of rows with unique row names, given class "data.frame". If no
> >> variables are included, the row names determine the number of rows."
> >>
> >> "2.  maybe your question is "is a given function for a vector, or for a
> >>     data frame/matrix/array?".  if so, i think the only way is reading
> >>     the help information (?foo)."
> >>
> >> Indeed! Is this not what the Help system is for?! But note also that
> >> the S3 class system may somewhat blur the issue: foo() may work
> >> appropriately and differently for different (S3) classes of objects. A
> >> detailed explanation of this behavior can be found in appropriate
> >> resources or (more tersely) via ?UseMethod .
> >>
> >> "you might find reading ?"[" and  ?"[.data.frame" useful"
> >>
> >> Not just 'useful" -- **essential** if you want to work in R, unless
> >> one gets this information via any of the numerous online tutorials,
> >> courses, or books that are available. The Help system is accurate and
> >> authoritative, but terse. I happen to like this mode of documentation,
> >> but others may prefer more extended expositions. I stand by this claim
> >> even if one chooses to use the "Tidyverse", data.table package, or
> >> other alternative frameworks for handling data. Again, others may
> >> disagree, but R is structured around these basics, and imo one remains
> >> ignorant of them at their peril.
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu>
> >> wrote:
> >>>
> >>> Kai,
> >>>
> >>>> one more question, how can I know if the function is for column
> >>>> manipulations or for vector?
> >>>
> >>> i still stumble around R code.  but, i'd say the following (and look
> >>> forward to being corrected! :):
> >>>
> >>> 1.  a column, when extracted from a data frame, *is* a vector.
> >>>
> >>> 2.  maybe your question is "is a given function for a vector, or for
> >> a
> >>>      data frame/matrix/array?".  if so, i think the only way is
> >> reading
> >>>      the help information (?foo).
> >>>
> >>> 3.  sometimes, extracting the column as a vector from a data
> >> frame-like
> >>>      object might be non-intuitive.  you might find reading ?"[" and
> >>>      ?"[.data.frame" useful (as well as ?"[.data.table" if you use
> >> that
> >>>      package).  also, the str() command can be helpful in
> >> understanding
> >>>      what is happening.  (the lobstr:: package's sxp() function, as
> >> well
> >>>      as more verbose .Internal(inspect()) can also give you insight.)
> >>>
> >>>      with the data.table:: package, for example, if "DT" is a
> >> data.table
> >>>      object, with "x2" as a column, adding or leaving off quotation
> >> marks
> >>>      for the column name can make all the difference between ending up
> >>>      with a vector, or with a (much reduced) data table:
> >>> ----
> >>>> is.vector(DT[, x2])
> >>> [1] TRUE
> >>>> str(DT[, x2])
> >>>   num [1:9] 32 32 32 32 32 32 32 32 32
> >>>>
> >>>> is.vector(DT[, "x2"])
> >>> [1] FALSE
> >>>> str(DT[, "x2"])
> >>> Classes ‘data.table’ and 'data.frame':  9 obs. of  1 variable:
> >>>   $ x2: num  32 32 32 32 32 32 32 32 32
> >>>   - attr(*, ".internal.selfref")=<externalptr>
> >>> ----
> >>>
> >>>      a second level of indexing may or may not help, mostly depending
> >> on
> >>>      the use of '[' versus of '[['.  this can sometimes cause
> >> confusion
> >>>      when you are learning the language.
> >>> ----
> >>>> str(DT[, "x2"][1])
> >>> Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
> >>>   $ x2: num 32
> >>>   - attr(*, ".internal.selfref")=<externalptr>
> >>>> str(DT[, "x2"][[1]])
> >>>   num [1:9] 32 32 32 32 32 32 32 32 32
> >>> ----
> >>>
> >>>      the tibble:: package (used in, e.g., the dplyr:: package) also
> >>>      (always?) returns a single column as a non-vector.  again, a
> >>>      second indexing with double '[[]]' can produce a vector.
> >>> ----
> >>>> DP <- tibble(DT)
> >>>> is.vector(DP[, "x2"])
> >>> [1] FALSE
> >>>> is.vector(DP[, "x2"][[1]])
> >>> [1] TRUE
> >>> ----
> >>>
> >>>      but, note that a list of lists is also a vector:
> >>>> is.vector(list(list(1), list(1,2,3)))
> >>> [1] TRUE
> >>>> str(list(list(1), list(1,2,3)))
> >>> List of 2
> >>>   $ :List of 1
> >>>    ..$ : num 1
> >>>   $ :List of 3
> >>>    ..$ : num 1
> >>>    ..$ : num 2
> >>>    ..$ : num 3
> >>>
> >>>      etc.
> >>>
> >>> hth.  good luck learning!
> >>>
> >>> cheers, Greg
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list