[R] cbind, data.frame | numeric to string?
David Winsemius
dwinsemius at comcast.net
Tue Apr 10 18:19:25 CEST 2012
On Apr 10, 2012, at 11:58 AM, Rainer Schuermann wrote:
> cbind() works as well, but only if c is attached to the existing
> test variable:
>
>> tst <- cbind( test, c )
>> tst
> a b c
> 1 1 0.3 y1
> 2 2 0.4 y2
> 3 3 0.5 y3
> 4 4 0.6 y4
> 5 5 0.7 y5
>> str( tst )
> 'data.frame': 5 obs. of 3 variables:
> $ a: num 1 2 3 4 5
> $ b: num 0.3 0.4 0.5 0.6 0.7
> $ c: Factor w/ 5 levels "y1","y2","y3",..: 1 2 3 4 5
>
> Not saying it is a good idea, though...
To be somewhat more expansive ... 'cbind' is not just one function,
but rather a set of functions, since it is "generic". The one that is
chosen by the interpreter will depend on whether the first argument
has a class. If it does have a class as in the example above having a
class of "data.frame", then the cbind.data.frame function will be
dispatched to process the list of arguments. If the first argument
doesn't have a class as in the OP's second example below, then the
internal cbind function will be used and returns a matrics which
strips off all but a few attributes and forces a lowest common
denominator mode. If only one of the arguments were logical, then
cbind would return a a matrix of all TRUEs and FALSEs.
(This all assumes that the typos in the OP's original example that
created 'c' as an incomplete expression and a and b with unequal
lengths were fixed.)
> a <- c(1,2,3,4,5);
> b <- c(0.3,0.4,0.5,0,6,0.7);
> test <- data.frame(cbind(a,b))
Warning message:
In cbind(a, b) :
number of rows of result is not a multiple of vector length (arg 1)
> c <- c("y1","y2","y3","y4","y5")
> cbind(c, test)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 5, 6
--
David.
>
> Rainer
>
>
> On Tuesday 10 April 2012 11:38:51 R. Michael Weylandt wrote:
>> Don't use cbind() -- it forces everything into a single type, here
>> string, which in turn becomes factor.
>>
>> Simply,
>>
>> data.frame(a, b, c)
>>
>> Like David mentioned a few days ago, I have no idea who is promoting
>> this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit
>> one that seems to be very frequent over the last few weeks)
>>
>> Michael
>>
>> On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen <anser.chen at gmail.com>
>> wrote:
>>> Complete newbie to R -- struggling with something which should be
>>> pretty
>>> basic. Trying to create a simple data set (which I gather R refers
>>> to as a
>>> data.frame). So
>>>
>>>> a <- c(1,2,3,4,5);
>>>> b <- c(0.3,0.4,0.5,0,6,0.7);
>>>
>>> Stick the two together into a data frame (call test) using cbind
>>>
>>>> test <- data.frame(cbind(a,b))
>>>
>>> Seems to do the trick:
>>>
>>>> test
>>> a b
>>> 1 1 0.3
>>> 2 2 0.4
>>> 3 3 0.5
>>> 4 4 0.6
>>> 5 5 0.7
>>>>
>>>
>>> Confirm that each variable is numeric:
>>>
>>>> is.numeric(test$a)
>>> [1] TRUE
>>>> is.numeric(test$b)
>>> [1] TRUE
>>>
>>>
>>> OK, so far so good. But, now I want to merge in a vector of
>>> characters:
>>>
>>>> c <- c('y1","y2","y3","y4","y5")
>>>
>>> Confirm that this is string:
>>>
>>>> is.numeric(c);
>>> [1] FALSE
>>>
>>> cbind c into the data frame:
>>>
>>>> test <- data.frame(cbind(a,b,c))
>>>
>>> Looks like everything is in place:
>>>
>>>> test
>>> a b c
>>> 1 1 0.3 y1
>>> 2 2 0.4 y2
>>> 3 3 0.5 y3
>>> 4 4 0.6 y4
>>> 5 5 0.7 y5
>>>
>>> Except that it seems as if the moment I cbind in a character
>>> vector, it
>>> changes numeric data to string:
>>>
>>>> is.numeric(test$a)
>>> [1] FALSE
>>>> is.numeric(test$b)
>>> [1] FALSE
>>>
>>> which would explain why the operations I'm trying to perform on
>>> elements of
>>> a and b columns are failing. If I look at the structure of the
>>> data.frame,
>>> I see that in fact *all* the variables are being entered as
>>> "factors".
>>>
>>>> str(test)
>>> 'data.frame': 5 obs. of 3 variables:
>>> $ a: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
>>> $ b: Factor w/ 5 levels "0.3","0.4","0.5",..: 1 2 3 4 5
>>> $ c: Factor w/ 5 levels "y1","y2","y3",..: 1 2 3 4 5
>>>
>>> But, if I try
>>>
>>> test <- data.frame(cbind(a,b))
>>>> str(test)
>>> 'data.frame': 5 obs. of 2 variables:
>>> $ a: num 1 2 3 4 5
>>> $ b: num 0.3 0.4 0.5 0.6 0.7
>>>
>>> a and b are coming back as numeric. So, why does cbind'ing a
>>> column of
>>> character variables change everything else? And, more to the
>>> point, what do
>>> I need to do to 'correct' the problem (i.e., stop this from
>>> happening).
>>>
>>> [[alternative HTML version deleted]]
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list