[R] splitting dataframe, assign to new dataframe, add new rows to new dataframe
cls59
chuck at sharpsteen.net
Tue Oct 13 03:41:20 CEST 2009
wk yeo wrote:
>
>
> Hi, all,
>
> My objective is to split a dataframe named "cmbine" according to the value
> of "classes". After the split, I will take the first instance from each
> class and bin them into a new dataframe, "df1". In the 2nd iteration, I
> will take the 2nd available instance and bin them into another new
> dataframe, "df2".
>
>
>>cmbine$names
> apple tiger pencil chicken banana pear
>
>>cmbine$mass
> 0.50 100.00 0.01 1.00 0.15 0.30
>
>>cmbine$classes
> 1 2 3 2 1 1
>
>
If possible, it would be helpful to provide sample data in a form that could
be copied and pasted directly into an R session, like so:
cmbine <- data.frame( names = c('apple', 'tiger', 'pencil', 'chicken',
'banana', 'pear' ) )
cmbine['mass'] <- c(0.50, 100.00, 0.01, 1.00, 0.15, 0.30)
cmbine['classes'] <- factor(c(1, 2, 3, 2,1 ,1))
It saves people on the list a bunch of coping/pasting/quote adding. Another
quick way to do this is to use the dump() which spits out the structure of
your object in a way that can be copied and pasted:
dump( 'cmbine', file='' )
wk yeo wrote:
>
>
> These are the results which I want to obtain:
>
>>df1
> classes mass
> apple 0.50
> tiger 100.00
> pencil 0.01
>
>>df2
> classes mass
> banana 0.15
> chicken 1.00
>
>>df3
> classes mass
> pear 0.30
>
> Below shows what I have tried. The main problem I have = I don't know how
> to assign the selected instance into a new dataframe with a name which is
> generated 'on-the-fly' based on the value of j (the jth row).
>
>
> for (i in 1:3) {
> same_cell <- cmbine[cmbine$classes == i, ]
> if (nrow(same_cell)!=0){
> for (j in 1:nrow(same_cell)){
> picked <- same_cell[j, ]
> assign(paste("df", j, sep=""), picked)
> #assign(paste("df",j, sep=""), paste("df", j, sep=""))
> }
> }
>
>
I'm assuming you want the results grouped by class, i.e. all the 1s in one
data frame all the 2s in another. This can be done with a slight
modification of your loop:
for (i in 1:3) {
same_cell <- cmbine[cmbine$classes == i, ]
if (nrow(same_cell)!=0){
assign(paste("df", i, sep=""), same_cell)
}
}
However, the results I get aren't the same as the results you said you
wanted:
> df1
names mass classes
1 apple 0.50 1
5 banana 0.15 1
6 pear 0.30 1
> df2
names mass classes
2 tiger 100 2
4 chicken 1 2
> df3
names mass classes
3 pencil 0.01 3
The "R way" of doing this is to use the by() function, which breaks a data
frame into sub-data frames based on a column of factors-- such as the
classes. For your example, it would be used as:
by( cmbine, cmbine[['classes']], function( df ){
# Lots of stuff can happen inside this function, in this case we are
really
# just returning the subset that got passed in.
return( df )
})
cmbine[["classes"]]: 1
names mass classes
1 apple 0.50 1
5 banana 0.15 1
6 pear 0.30 1
-----------------------------------------------------------------------
cmbine[["classes"]]: 2
names mass classes
2 tiger 100 2
4 chicken 1 2
-----------------------------------------------------------------------
cmbine[["classes"]]: 3
names mass classes
3 pencil 0.01 3
The by() function returns a fancy list, each component of which can be
accessed using the [] operator.
Hope this helps!
-Charlie
-----
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
--
View this message in context: http://www.nabble.com/splitting-dataframe%2C-assign-to-new-dataframe%2C-add-new-rows-to-new-dataframe-tp25865409p25865911.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list