[R] Why strsplit can be used with matrix but not data.frame?

David Winsemius dwinsemius at comcast.net
Thu Sep 17 03:56:47 CEST 2009


On Sep 16, 2009, at 9:41 PM, Peng Yu wrote:

> On Wed, Sep 16, 2009 at 8:30 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On Sep 16, 2009, at 9:22 PM, Peng Yu wrote:
>>
>>> Hi,
>>>
>>> As show in the code below, strsplit can be applied to a matrix but  
>>> not
>>> a data.frame. I don't understand why R is designed in this way. Can
>>> somebody help me understand it? How to split all the strings in x$y?
>>>
>>> x=data.frame(x=1:10,y=rep("abc",10))
>>> strsplit(x$y,'b') #Error in strsplit(x$y, "b") : non-character  
>>> argument
>>> y=cbind(1:10,rep("abc",10))
>>> strsplit(y[,2],'b')
>>
>> You've been tripped up by the factor demon.
>>
>>  ?strsplit
>>  str(x)
>>
>> 'data.frame':   10 obs. of  2 variables:
>>  $ x: int  1 2 3 4 5 6 7 8 9 10
>>  $ y: Factor w/ 1 level "abc": 1 1 1 1 1 1 1 1 1 1
>>
>>
>> There is an option:
>> stringsAsFactors:   The default setting for arguments of data.frame  
>> and
>> read.table.
>>
>> Which if changed to FALSE would allow you to "design" as you see fit.
>
> I see that I can specify 'F' for stringsAsFactors when I initialize a
> data.frame. But if I already have a data.frame, how to change the
> 'stringsAsFactors' option of it?
>
>    data.frame(..., row.names = NULL, check.rows = FALSE,
>               check.names = TRUE,
>               stringsAsFactors = default.stringsAsFactors())

If you want to change a factor column to a character column:

as.character(x$y)

 > strsplit( as.character(x$y) , "b")
[[1]]
[1] "a" "c"

[[2]]
[1] "a" "c"

[[3]]
[1] "a" "c"

[[4]]
[1] "a" "c"

[[5]]
[1] "a" "c"

[[6]]
[1] "a" "c"

[[7]]
[1] "a" "c"

[[8]]
[1] "a" "c"

[[9]]
[1] "a" "c"

[[10]]
[1] "a" "c"

>
-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list