[R] Re : Adding column sum to new row in data frame
Joshua Wiley
jwiley.psych at gmail.com
Thu May 20 07:01:58 CEST 2010
Dear Mohan,
First, I would like to modify my code slightly to:
data <- rbind(data,data.frame(State="Total",t(apply(data[,-1], 2, sum,
na.rm=TRUE))))
This actually will add a 7th level to your factor automatically. The
reason I wanted to change from using c() to data.frame() is that if
one uses c(), all the columns are converted to character (this has to
do with different methods for rbind, see ?rbind particularly the
Details and Value section which describe the different methods for
rbind and what its behavior will be if it is using the default
method). This may not be an issue, but it would hamper any subsequent
calculations you may wish to perform on your data.
Aside from the way I just showed, you have many options.
(1) Read the data in and do not convert the state names to factor in
the first place (David's suggestion). This is most likely the easiest.
The only reason I can see not wanting to use this is if there were
several other columns you wanted treated as factors, in which case
look at option #3.
See ?read.csv for details. Note that: "'read.csv' and 'read.csv2' are
identical to 'read.table' except for the defaults." so it is perfectly
legal to use the stringsAsFactors argument shown for read.table.
data <- read.csv(file='ipsample.csv', header=TRUE,
stringsAsFactors=FALSE) #what your read in code might look like
Even if you do this, I would still recommend against my original code
because it converts everything else to character too.
(2) Add a level to the factor for total (as you showed). I do not see
any real problem with this, but it is a bit of a complicated solution.
My updated code actually leads to similar results.
(3) Supposing the data is already read in as a factor and you would
like to change it, you can convert just that column to character.
Once the column is character, you can name that cell Total, or
whatever else you like.
data$State <- as.character(data$State)
(4) Wu Gong suggested this option. The difficulty with this option is
that because cbind is like rbind, the default without any dataframe
arguments will wind up with everything being converted to character,
just like my original code.
data <- rbind(data,cbind(State="Total", t(apply(data[,-1],2,sum))))
Depending exactly what you are working with, different solutions may
be better or wrose and these are certainly not the only ways you could
handle it.
Best regards,
Josh
On Wed, May 19, 2010 at 8:19 PM, Mohan L <l.mohanphy at gmail.com> wrote:
>
>>
>>
>> How to safely avoid this warning massage?
>> Now I have <NA> instead of "Total" in last row State column. How to I
>> replace it as "Total"?
>>
> Dear All,
>
> The below link provides a very good explanation of "Creating factor
> variables" and way to avoid the warning message
> http://www.ats.ucla.edu/stat/R/modules/factor_variables.htm
>
> Now it works for me without warning message ( invalid factor level, NAs
> generated ) according to the document above , I did below to avoid the
> warning:
>
>> data <- read.csv(file='ipsample.csv',sep=',' , header=TRUE)
>> data
> State Jan Feb Mar Apr May Jun
> 1 AAA 1 1 0 2 2 0
> 2 BBB 1298 1195 1212 1244 1158 845
> 3 CCC 0 0 0 1 2 1
> 4 DDD 5 11 17 15 10 9
> 5 EEE 18 28 27 23 23 16
> 6 FFF 68 152 184 135 111 86
>
>
>> a <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
> Warning message:
> In `[<-.factor`(`*tmp*`, ri, value = "Total") :
> invalid factor level, NAs generated
>
>> a
> State Jan Feb Mar Apr May Jun
> 1 AAA 1 1 0 2 2 0
> 2 BBB 1298 1195 1212 1244 1158 845
> 3 CCC 0 0 0 1 2 1
> 4 DDD 5 11 17 15 10 9
> 5 EEE 18 28 27 23 23 16
> 6 FFF 68 152 184 135 111 86
> 7 <NA> 1390 1387 1440 1420 1306 957
>
> We can see that instead of "Total", the label was <NA>. To do this
> correctly, I have added the new level, "Total", to the factor column
> data$State using the factor function with the levels argument. Then I can
> finally add an element to the factor variable from the new level. here is
> the steps
>
>> levels(data$State)
> [1] "AAA" "BBB" "CCC " "DDD" "EEE" "FFF"
>
>> data$State <- factor(data$State,levels=c(levels(data$State),"Total"))
>
>> data$State
> [1] AAA BBB CCC DDD EEE FFF
> Levels: AAA BBB CCC DDD EEE FFF Total
>
>> levels(data$State)
> [1] "AAA" "BBB" "CCC " "DDD" "EEE" "FFF" "Total"
>
>> x <- rbind(data, c("Total",apply(data[,-1], 2, sum, na.rm=TRUE)))
>
> Now the above works without warning.
>
>> x
> State Jan Feb Mar Apr May Jun
> 1 AAA 1 1 0 2 2 0
> 2 BBB 1298 1195 1212 1244 1158 845
> 3 CCC 0 0 0 1 2 1
> 4 DDD 5 11 17 15 10 9
> 5 EEE 18 28 27 23 23 16
> 6 FFF 68 152 184 135 111 86
> 7 Total 1390 1387 1440 1420 1306 957
>
> I think I am doing right. If I miss understood anything. Please guide me I
> am beginer to R.
>
> Thanks & Rg
> Mohan L
>
>
>
>
--
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/
More information about the R-help
mailing list