[R] question: data.frame data conversion

Rui Barradas ruipbarradas at sapo.pt
Sun Aug 4 21:47:43 CEST 2013


Hello,

You're insisting in as.data.frame(cbind(...)). Don't do that. Just see 
the difference:

z = data.frame(x, y)
str(z)
'data.frame':   8 obs. of  2 variables:
  $ x: Factor w/ 3 levels "a","b","c": 1 1 1 2 2 2 3 3
  $ y: num  1 1.2 1.1 1.01 1.03 1 2 3

z2 = as.data.frame(cbind(x,y))
str(z2)
'data.frame':   8 obs. of  2 variables:
  $ x: Factor w/ 3 levels "a","b","c": 1 1 1 2 2 2 3 3
  $ y: Factor w/ 7 levels "1","1.01","1.03",..: 1 5 4 2 3 1 6 7


What happens is that cbind creates a matrix from x and y, converting all 
to character, and y is no longer numeric. Then as.data.frame converts 
the strings to factors, the default behavior.

As for your question, change fun() to the following.

fun <- function(z){
	zs <- split(z, x)
	n <- length(zs)
	m <- sapply(zs, nrow)
	id <- unlist(sapply(m, seq_len))
	zz <- cbind(id, z)
	dcast(zz, id ~ x)[-1]
}


Hope this helps,

Rui Barradas


Em 04-08-2013 20:34, Brijesh Gulati escreveu:
> Hello Rui: Thanks for the solution. It does work to the specification. Just
> one follow-up. I get an error if the number of repeating values are
> different. For instance, in the following example, "c" is repeated only 2
> times; whereas "a" and "b" three times. I am fine with the output shows NA
> for the missing values. Any help would be greatly appreciated.
>
> x = c("a","a", "a", "b","b","b", "c", "c")
>    y = c(1.0, 1.2, 1.1, 1.01, 1.03, 1.0, 2.0, 3.0)
>    z = as.data.frame(cbind(x,y))
>
>    x    y
> 1 a    1
> 2 a  1.2
> 3 a  1.1
> 4 b 1.01
> 5 b 1.03
> 6 b    1
> 7 c    2
> 8 c    3
>
> -----Original Message-----
> From: Rui Barradas [mailto:ruipbarradas at sapo.pt]
> Sent: Sunday, August 04, 2013 1:57 PM
> To: Brijesh Gulati
> Cc: r-help at r-project.org
> Subject: Re: [R] question: data.frame data conversion
>
> Hello,
>
> First of all, do _not_ create a data frame with
>
> as.data.frame(cbind(...))
>
> Instead, use
>
> z = data.frame(x, y)
>
> As for your question, try the following.
>
>
> library(reshape2)
> fun <- function(z){
> 	zs <- split(z, x)
> 	n <- length(zs)
> 	m <- nrow(zs[[1]])
> 	zz <- cbind(id = rep(1:m, n), z)
> 	dcast(zz, id ~ x)[-1]
> }
>
> fun(z)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Em 04-08-2013 13:49, Brijesh Gulati escreveu:
>> Hello, I have a data.frame with repeating rows and corresponding
>> value. For instance, "z" will be an example of that.
>>
>>
>>
>> x = c("a","a", "a", "b","b","b")
>>
>>     y = c(1.0, 1.2, 1.1, 1.01, 1.03, 1.0)
>>
>>     z = as.data.frame(cbind(x,y))
>>
>>
>>
>>> z
>>
>>     x    y
>>
>> 1 a    1
>>
>> 2 a  1.2
>>
>> 3 a  1.1
>>
>> 4 b 1.01
>>
>> 5 b 1.03
>>
>> 6 b    1
>>
>>
>>
>> So, you see that "a" and "b" are repeated 3 times and have three
>> different value. I would like to convert this data into something like the
> following.
>>
>>
>>
>>      a    b
>>     1.0 1.01
>>     1.2 1.03
>>     1.1 1.00
>>
>>
>>
>> In the above, repeating rows (a,b) become columns and their values
>> show up in their respective column.
>>
>> Finally, to clarify few things. The number of rows of each repeating
>> item (a or b) would be the same and hence, the number of row expected
>> in the output shall be the same.
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list