[R] problem for strsplit function
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sat Jul 10 02:16:13 CEST 2021
On 09/07/2021 6:44 p.m., Bert Gunter wrote:
> OK, I stand somewhat chastised.
>
> But my point still is that what you get when you "extract" depends on
> how you define "extract." Do note that ?"[" yields a help file titled
> "Extract or Replace Parts of an object"; and afaics, the term "subset"
> is not explicitly used as Duncan prefers.
?"[[" gives you the same page, but I agree: this part of the
documentation isn't written very clearly. The "Introduction to R" manual
uses the terms I used (see section 2.7, "Index vectors; selecting and
modifying subsets of a data set"), as does the source code (and the R
Language Definition manual, though it's not as clear as the Intro).
But the point isn't to chastise you, it's to educate you (and the OP).
Thinking of [] as subsetting is more helpful than thinking of it as
extraction. That way the result of x[c(1,2)] makes sense. It's a
little bit more of a stretch, but the result of x[[c(1,2)]] also makes
sense when you think of it as extraction.
Duncan Murdoch
The relevant part of the
> Help file says for "[" for recursive objects says: "Indexing by [ is
> similar to atomic vectors and selects a list of the specified
> element(s)." That a data.frame is a list is explicitly stated, as I
> noted; that lists are in fact vectors is also explicitly stated (?list
> says: "Almost all lists in R internally are Generic Vectors") but then
> one is stuck with: a data.frame is a list and therefore a vector, but
> is.vector(d3) is FALSE. The explanation is explicit again in
> ?is.vector ("is.vector returns TRUE if x is a vector of the specified
> mode having no attributes other than names. It returns FALSE
> otherwise."). But I would say these issues are sufficiently murky that
> my warning to be precise is not entirely inappropriate; unfortunately,
> I may have made them more so. Sigh....
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>
>> On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
>>> "Strictly speaking", Greg is correct, Bert.
>>>
>>> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
>>>
>>> Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
>>
>> I would also object to v3 (below) as "extracting" a column from d.
>> "d[2]" doesn't extract anything, it "subsets" the data frame, so the
>> result is a data frame, not what you get when you extract something from
>> a data frame.
>>
>> People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal.
>> That extracts the 3rd element (the number 3). The problem is that R has
>> no way to represent a scalar number, only a vector of numbers, so x[[3]]
>> gets promoted to a vector containing that number when it is returned and
>> assigned to y.
>>
>> Lists are vectors of R objects, so if x is a list, x[[3]] is something
>> that can be returned, and it is different from x[3].
>>
>> Duncan Murdoch
>>
>>>
>>> On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>>> "1. a column, when extracted from a data frame, *is* a vector."
>>>> Strictly speaking, this is false; it depends on exactly what is meant
>>>> by "extracted." e.g.:
>>>>
>>>>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
>>>>> v1 <- d[,2] ## a vector
>>>>> v2 <- d[[2]] ## the same, i.e
>>>>> identical(v1,v2)
>>>> [1] TRUE
>>>>> v3 <- d[2] ## a data.frame
>>>>> v1
>>>> [1] "a" "b" "c" ## a character vector
>>>>> v3
>>>> col2
>>>> 1 a
>>>> 2 b
>>>> 3 c
>>>>> is.vector(v1)
>>>> [1] TRUE
>>>>> is.vector(v3)
>>>> [1] FALSE
>>>>> class(v3) ## data.frame
>>>> [1] "data.frame"
>>>> ## but
>>>>> is.list(v3)
>>>> [1] TRUE
>>>>
>>>> which is simply explained in ?data.frame (where else?!) by:
>>>> "A data frame is a **list** [emphasis added] of variables of the same
>>>> number of rows with unique row names, given class "data.frame". If no
>>>> variables are included, the row names determine the number of rows."
>>>>
>>>> "2. maybe your question is "is a given function for a vector, or for a
>>>> data frame/matrix/array?". if so, i think the only way is reading
>>>> the help information (?foo)."
>>>>
>>>> Indeed! Is this not what the Help system is for?! But note also that
>>>> the S3 class system may somewhat blur the issue: foo() may work
>>>> appropriately and differently for different (S3) classes of objects. A
>>>> detailed explanation of this behavior can be found in appropriate
>>>> resources or (more tersely) via ?UseMethod .
>>>>
>>>> "you might find reading ?"[" and ?"[.data.frame" useful"
>>>>
>>>> Not just 'useful" -- **essential** if you want to work in R, unless
>>>> one gets this information via any of the numerous online tutorials,
>>>> courses, or books that are available. The Help system is accurate and
>>>> authoritative, but terse. I happen to like this mode of documentation,
>>>> but others may prefer more extended expositions. I stand by this claim
>>>> even if one chooses to use the "Tidyverse", data.table package, or
>>>> other alternative frameworks for handling data. Again, others may
>>>> disagree, but R is structured around these basics, and imo one remains
>>>> ignorant of them at their peril.
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu>
>>>> wrote:
>>>>>
>>>>> Kai,
>>>>>
>>>>>> one more question, how can I know if the function is for column
>>>>>> manipulations or for vector?
>>>>>
>>>>> i still stumble around R code. but, i'd say the following (and look
>>>>> forward to being corrected! :):
>>>>>
>>>>> 1. a column, when extracted from a data frame, *is* a vector.
>>>>>
>>>>> 2. maybe your question is "is a given function for a vector, or for
>>>> a
>>>>> data frame/matrix/array?". if so, i think the only way is
>>>> reading
>>>>> the help information (?foo).
>>>>>
>>>>> 3. sometimes, extracting the column as a vector from a data
>>>> frame-like
>>>>> object might be non-intuitive. you might find reading ?"[" and
>>>>> ?"[.data.frame" useful (as well as ?"[.data.table" if you use
>>>> that
>>>>> package). also, the str() command can be helpful in
>>>> understanding
>>>>> what is happening. (the lobstr:: package's sxp() function, as
>>>> well
>>>>> as more verbose .Internal(inspect()) can also give you insight.)
>>>>>
>>>>> with the data.table:: package, for example, if "DT" is a
>>>> data.table
>>>>> object, with "x2" as a column, adding or leaving off quotation
>>>> marks
>>>>> for the column name can make all the difference between ending up
>>>>> with a vector, or with a (much reduced) data table:
>>>>> ----
>>>>>> is.vector(DT[, x2])
>>>>> [1] TRUE
>>>>>> str(DT[, x2])
>>>>> num [1:9] 32 32 32 32 32 32 32 32 32
>>>>>>
>>>>>> is.vector(DT[, "x2"])
>>>>> [1] FALSE
>>>>>> str(DT[, "x2"])
>>>>> Classes ‘data.table’ and 'data.frame': 9 obs. of 1 variable:
>>>>> $ x2: num 32 32 32 32 32 32 32 32 32
>>>>> - attr(*, ".internal.selfref")=<externalptr>
>>>>> ----
>>>>>
>>>>> a second level of indexing may or may not help, mostly depending
>>>> on
>>>>> the use of '[' versus of '[['. this can sometimes cause
>>>> confusion
>>>>> when you are learning the language.
>>>>> ----
>>>>>> str(DT[, "x2"][1])
>>>>> Classes ‘data.table’ and 'data.frame': 1 obs. of 1 variable:
>>>>> $ x2: num 32
>>>>> - attr(*, ".internal.selfref")=<externalptr>
>>>>>> str(DT[, "x2"][[1]])
>>>>> num [1:9] 32 32 32 32 32 32 32 32 32
>>>>> ----
>>>>>
>>>>> the tibble:: package (used in, e.g., the dplyr:: package) also
>>>>> (always?) returns a single column as a non-vector. again, a
>>>>> second indexing with double '[[]]' can produce a vector.
>>>>> ----
>>>>>> DP <- tibble(DT)
>>>>>> is.vector(DP[, "x2"])
>>>>> [1] FALSE
>>>>>> is.vector(DP[, "x2"][[1]])
>>>>> [1] TRUE
>>>>> ----
>>>>>
>>>>> but, note that a list of lists is also a vector:
>>>>>> is.vector(list(list(1), list(1,2,3)))
>>>>> [1] TRUE
>>>>>> str(list(list(1), list(1,2,3)))
>>>>> List of 2
>>>>> $ :List of 1
>>>>> ..$ : num 1
>>>>> $ :List of 3
>>>>> ..$ : num 1
>>>>> ..$ : num 2
>>>>> ..$ : num 3
>>>>>
>>>>> etc.
>>>>>
>>>>> hth. good luck learning!
>>>>>
>>>>> cheers, Greg
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
More information about the R-help
mailing list