[R] Why strsplit can be used with matrix but not data.frame?
David Winsemius
dwinsemius at comcast.net
Thu Sep 17 03:56:47 CEST 2009
On Sep 16, 2009, at 9:41 PM, Peng Yu wrote:
> On Wed, Sep 16, 2009 at 8:30 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>>
>> On Sep 16, 2009, at 9:22 PM, Peng Yu wrote:
>>
>>> Hi,
>>>
>>> As show in the code below, strsplit can be applied to a matrix but
>>> not
>>> a data.frame. I don't understand why R is designed in this way. Can
>>> somebody help me understand it? How to split all the strings in x$y?
>>>
>>> x=data.frame(x=1:10,y=rep("abc",10))
>>> strsplit(x$y,'b') #Error in strsplit(x$y, "b") : non-character
>>> argument
>>> y=cbind(1:10,rep("abc",10))
>>> strsplit(y[,2],'b')
>>
>> You've been tripped up by the factor demon.
>>
>> ?strsplit
>> str(x)
>>
>> 'data.frame': 10 obs. of 2 variables:
>> $ x: int 1 2 3 4 5 6 7 8 9 10
>> $ y: Factor w/ 1 level "abc": 1 1 1 1 1 1 1 1 1 1
>>
>>
>> There is an option:
>> stringsAsFactors: The default setting for arguments of data.frame
>> and
>> read.table.
>>
>> Which if changed to FALSE would allow you to "design" as you see fit.
>
> I see that I can specify 'F' for stringsAsFactors when I initialize a
> data.frame. But if I already have a data.frame, how to change the
> 'stringsAsFactors' option of it?
>
> data.frame(..., row.names = NULL, check.rows = FALSE,
> check.names = TRUE,
> stringsAsFactors = default.stringsAsFactors())
If you want to change a factor column to a character column:
as.character(x$y)
> strsplit( as.character(x$y) , "b")
[[1]]
[1] "a" "c"
[[2]]
[1] "a" "c"
[[3]]
[1] "a" "c"
[[4]]
[1] "a" "c"
[[5]]
[1] "a" "c"
[[6]]
[1] "a" "c"
[[7]]
[1] "a" "c"
[[8]]
[1] "a" "c"
[[9]]
[1] "a" "c"
[[10]]
[1] "a" "c"
>
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list