[R] problem for strsplit function
Kai Yang
y@ngk@|9999 @end|ng |rom y@hoo@com
Sat Jul 10 00:45:35 CEST 2021
Thanks Bert,
I'm reading some books now. But it takes me a while to get familiar R.
Best,
Kai On Friday, July 9, 2021, 03:06:11 PM PDT, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
> "Strictly speaking", Greg is correct, Bert.
>
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
>
> Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something from
a data frame.
People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal.
That extracts the 3rd element (the number 3). The problem is that R has
no way to represent a scalar number, only a vector of numbers, so x[[3]]
gets promoted to a vector containing that number when it is returned and
assigned to y.
Lists are vectors of R objects, so if x is a list, x[[3]] is something
that can be returned, and it is different from x[3].
Duncan Murdoch
>
> On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>> "1. a column, when extracted from a data frame, *is* a vector."
>> Strictly speaking, this is false; it depends on exactly what is meant
>> by "extracted." e.g.:
>>
>>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
>>> v1 <- d[,2] ## a vector
>>> v2 <- d[[2]] ## the same, i.e
>>> identical(v1,v2)
>> [1] TRUE
>>> v3 <- d[2] ## a data.frame
>>> v1
>> [1] "a" "b" "c" ## a character vector
>>> v3
>> col2
>> 1 a
>> 2 b
>> 3 c
>>> is.vector(v1)
>> [1] TRUE
>>> is.vector(v3)
>> [1] FALSE
>>> class(v3) ## data.frame
>> [1] "data.frame"
>> ## but
>>> is.list(v3)
>> [1] TRUE
>>
>> which is simply explained in ?data.frame (where else?!) by:
>> "A data frame is a **list** [emphasis added] of variables of the same
>> number of rows with unique row names, given class "data.frame". If no
>> variables are included, the row names determine the number of rows."
>>
>> "2. maybe your question is "is a given function for a vector, or for a
>> data frame/matrix/array?". if so, i think the only way is reading
>> the help information (?foo)."
>>
>> Indeed! Is this not what the Help system is for?! But note also that
>> the S3 class system may somewhat blur the issue: foo() may work
>> appropriately and differently for different (S3) classes of objects. A
>> detailed explanation of this behavior can be found in appropriate
>> resources or (more tersely) via ?UseMethod .
>>
>> "you might find reading ?"[" and ?"[.data.frame" useful"
>>
>> Not just 'useful" -- **essential** if you want to work in R, unless
>> one gets this information via any of the numerous online tutorials,
>> courses, or books that are available. The Help system is accurate and
>> authoritative, but terse. I happen to like this mode of documentation,
>> but others may prefer more extended expositions. I stand by this claim
>> even if one chooses to use the "Tidyverse", data.table package, or
>> other alternative frameworks for handling data. Again, others may
>> disagree, but R is structured around these basics, and imo one remains
>> ignorant of them at their peril.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall using umich.edu>
>> wrote:
>>>
>>> Kai,
>>>
>>>> one more question, how can I know if the function is for column
>>>> manipulations or for vector?
>>>
>>> i still stumble around R code. but, i'd say the following (and look
>>> forward to being corrected! :):
>>>
>>> 1. a column, when extracted from a data frame, *is* a vector.
>>>
>>> 2. maybe your question is "is a given function for a vector, or for
>> a
>>> data frame/matrix/array?". if so, i think the only way is
>> reading
>>> the help information (?foo).
>>>
>>> 3. sometimes, extracting the column as a vector from a data
>> frame-like
>>> object might be non-intuitive. you might find reading ?"[" and
>>> ?"[.data.frame" useful (as well as ?"[.data.table" if you use
>> that
>>> package). also, the str() command can be helpful in
>> understanding
>>> what is happening. (the lobstr:: package's sxp() function, as
>> well
>>> as more verbose .Internal(inspect()) can also give you insight.)
>>>
>>> with the data.table:: package, for example, if "DT" is a
>> data.table
>>> object, with "x2" as a column, adding or leaving off quotation
>> marks
>>> for the column name can make all the difference between ending up
>>> with a vector, or with a (much reduced) data table:
>>> ----
>>>> is.vector(DT[, x2])
>>> [1] TRUE
>>>> str(DT[, x2])
>>> num [1:9] 32 32 32 32 32 32 32 32 32
>>>>
>>>> is.vector(DT[, "x2"])
>>> [1] FALSE
>>>> str(DT[, "x2"])
>>> Classes ‘data.table’ and 'data.frame': 9 obs. of 1 variable:
>>> $ x2: num 32 32 32 32 32 32 32 32 32
>>> - attr(*, ".internal.selfref")=<externalptr>
>>> ----
>>>
>>> a second level of indexing may or may not help, mostly depending
>> on
>>> the use of '[' versus of '[['. this can sometimes cause
>> confusion
>>> when you are learning the language.
>>> ----
>>>> str(DT[, "x2"][1])
>>> Classes ‘data.table’ and 'data.frame': 1 obs. of 1 variable:
>>> $ x2: num 32
>>> - attr(*, ".internal.selfref")=<externalptr>
>>>> str(DT[, "x2"][[1]])
>>> num [1:9] 32 32 32 32 32 32 32 32 32
>>> ----
>>>
>>> the tibble:: package (used in, e.g., the dplyr:: package) also
>>> (always?) returns a single column as a non-vector. again, a
>>> second indexing with double '[[]]' can produce a vector.
>>> ----
>>>> DP <- tibble(DT)
>>>> is.vector(DP[, "x2"])
>>> [1] FALSE
>>>> is.vector(DP[, "x2"][[1]])
>>> [1] TRUE
>>> ----
>>>
>>> but, note that a list of lists is also a vector:
>>>> is.vector(list(list(1), list(1,2,3)))
>>> [1] TRUE
>>>> str(list(list(1), list(1,2,3)))
>>> List of 2
>>> $ :List of 1
>>> ..$ : num 1
>>> $ :List of 3
>>> ..$ : num 1
>>> ..$ : num 2
>>> ..$ : num 3
>>>
>>> etc.
>>>
>>> hth. good luck learning!
>>>
>>> cheers, Greg
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list