[R] How can I rearange my dataframe
jim holtman
jholtman at gmail.com
Tue Feb 9 18:02:54 CET 2010
try this:
> x <- read.table(textConnection("name nicknames value
+ 1 A A1 4
+ 2 B B1 5
+ 3 C C1 9
+ 4 B B2 2
+ 5 C C2 7
+ 6 C C3 6
+ 7 C C4 3
+ 8 B B3 6
+ 9 C C5 7"), header=TRUE)
> closeAllConnections()
> result <- do.call(rbind, lapply(split(x, x$name), function(.name){
+ data.frame(name=.name$name[1], nicknames=paste(.name$nicknames,
collapse=','),
+ mean=mean(.name$value))
+ }))
>
> result
name nicknames mean
A A A1 4.000000
B B B1,B2,B3 4.333333
C C C1,C2,C3,C4,C5 6.400000
>
On Tue, Feb 9, 2010 at 11:24 AM, Alex Levitchi <alex.levitchi at cbm.fvg.it> wrote:
> Hello
> I am recently began to work with R, so I am not so experienced.
> But anyway I cannot find a clear way to process my dataframe which is a bigger one.
> It shows similar to this
>
>> name=c("A","B","C","B","C","C","C","B","C")
>> nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5")
>> value=c(4,5,9,2,7,6,3,6,7)
>> table=data.frame(cbind(name,nickname,value))
>> table=data.frame(cbind(name,nicknames,value))
>> table
> name nicknames value
> 1 A A1 4
> 2 B B1 5
> 3 C C1 9
> 4 B B2 2
> 5 C C2 7
> 6 C C3 6
> 7 C C4 3
> 8 B B3 6
> 9 C C5 7
>
> So I have to rearrange it in the next way:
> - the first column should contain just unduplicated data, I did this, it is OK and it will look like
> 1 A
> 2 B
> 3 C
>
> - the second column should contain different 'nicknames' which correspond to the single A, B or C
> name nickname value
> 1 A A1
> 2 B B1,B2,B3
> 3 C C1,C2,C3,C4,C5
>
> -the third one should contain the mean value of the numbers which correspond to the same A, B or C
> 1 A A1 mean(4)
> 2 B B1,B2,B3 mean(5,2,6)
> 3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7)
>
> I did this using a loop 'for'.
> to be clear I created tree dataframes which correspond to each of columns, and finally will combine them
>
>> ulist=which(!duplicated(table$name)) # I extract the list of positions in which I don't have duplications
>> name1=data.frame(table$name[ulist]) # I extract the list of unique names
>> nicknames1=data.frame(row.names(1:length(ulist))) # I create a dataframe of dimension equal to unique list length
>> value1=data.frame(row.names(1:length(ulist))) # I create a dataframe of dimension equal to unique list length
>
>> for(i in 1:length(ulist)) {
> position=which(as.character(name1[i,1])==table$name)
> nicknames1[i,1]=toString(table$nicknames[position])
> value1[i,1]=mean(as.numeric(table$value[position]))
> }
>> fin=cbind(name1,nicknames1,value1)
>> colnames(fin)=c("NAME","NICKNAME","VALUE")
>> fin
> NAME NICKNAME VALUE
> 1 A A1 3.000000
> 2 B B1, B2, B3 3.333333
> 3 C C1, C2, C3, C4, C5 5.200000
>
> it works successfully. But in general I work with dataframes of high dimensions (tens thousands or more rows).
> So my loop works too slow (i.e., a dataframe of 20000 rows and 3 columns is processed in about 10 minutes).
> I intend to integrate it into a function, so it is obvious that time will be even longer.
>
> If someone can advise me any possibility to modify which I have done or to the way I can do it, please give me a message.
>
> King regards to all guys who develop and maintain R sources for such dummies as me
> Alex Levitchi
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list